如何解决如何从 Pandas DataFrame 绘制家谱?
我有一张表,用于存储有关我祖先的信息。例如,我创建了一个受教父启发的类似表格。
|--------+---+-------------+-----------+------+------+--------+--------+----------------+----------------|
| ID | S | First name | Last name | dob | DoD | FID | MID | Place of birth | Job |
|--------+---+-------------+-----------+------+------+--------+--------+----------------+----------------|
| AnAn | M | Antonio | Andolini | | 1901 | | | Corleone | |
| SiAn | F | Signora | Andolini | | 1901 | | | Corleone | housewife |
| PaAn87 | M | Paolo | Andolini | 1887 | 1901 | AnAn | SiAn | | |
| ViCo92 | M | Vito | Corleone | 1892 | 1954 | AnAn | SiAn | Corleone | godfather |
| CaCo97 | F | Carmella | Corleone | 1897 | 1959 | | | | |
| ToHa10 | M | Tom | Hagen | 1910 | 1970 | ViCo92 | CaCo97 | New York | Consigliere |
| SaCo16 | M | Santino | Corleone | 1916 | 1948 | ViCo92 | CaCo97 | New York | gangster |
| SaCo17 | F | Sandra | Colombo | 1917 | | | | Messina | |
| FrCo19 | M | Frederico | Corleone | 1919 | 1959 | ViCo92 | CaCo97 | New York | Casino Manager |
| MiCo20 | M | Michael | Corleone | 1920 | 1997 | ViCo92 | CaCo97 | New York | godfather |
| ThHa20 | F | Theresa | Hagen | 1920 | | | | New Jersey | Art expert |
| LuMa23 | F | Lucy | Mancini | 1923 | | | | | Hotel employee |
| KaAd24 | F | Kay | Adams | 1934 | | | | | |
| FrCo37 | F | Francessa | Corleone | 1937 | | SaCo16 | SaCo17 | | |
| KaCo37 | F | Kathryn | Corleone | 1937 | | SaCo16 | SaCo17 | | |
| FrCo40 | F | Frank | Corleone | 1940 | | SaCo16 | SaCo17 | | |
| SaCo45 | M | Santino Jr. | Corleone | 1945 | | SaCo16 | SaCo17 | | |
| FrHa | M | Frank | Hagen | 1940 | | ToHa10 | Th20 | | |
| AnHa42 | M | Andrew | Hagen | 1942 | | ToHa10 | Th20 | | Priest |
| ViMa | M | vincent | Mancini | 1948 | | SaCo16 | LuMa23 | New York | Godfather |
| GiHa58 | F | Gianna | Hagen | 1948 | | ToHa10 | Th20 | | |
| AnCo51 | M | Anthony | Corleone | 1951 | | MiCo20 | KaAd24 | New York | Singer |
| MaCo53 | F | Mary | Corleone | 1953 | 1979 | MiCo20 | KaAd24 | New York | Student |
| ChHa54 | F | Christina | Hagen | 1954 | | ToHa10 | Th20 | | |
| CoCo27 | F | Constanzia | Corleone | 1927 | | ViCo92 | CaCo97 | New York | rentier |
| CaRi20 | M | Carlo | Rizzi | 1920 | 1955 | | | Nevada | Bookmaker |
| ViRi49 | M | Victor | Rizzi | 1949 | | CaRi20 | CoCo27 | New York | |
| MiRi | M | Michael | Rizzi | 1955 | | CaRi20 | CoCo27 | | |
|--------+---+-------------+-----------+------+------+--------+--------+----------------+----------------|
这里,个体之间的关系可以理解为有向无环图(DAG)。我的目标是使用图形绘制将此表可视化为家谱。
首先,我将表格转换为边列表,其中 ID
是起始顶点,ParentID
是结束顶点:
import pandas as pd
rawdf = pd.read_csv('corleone.csv')
el1 = rawdf[['ID','MID']]
el2 = rawdf[['ID','FID']]
el1.columns = ['Child','ParentID']
el2.columns = el1.columns
el = pd.concat([el1,el2])
el = el.dropna()
df = el.merge(rawdf,left_index=True,right_index=True,how='left')
df['name'] = df[df.columns[4:6]].apply(lambda x: ' '.join(x.dropna().astype(str)),axis=1)
df = df.drop(['Child','FID','MID','First name','Last name'],axis=1)
df = df[['ID','name','S','dob','DoD','Place of birth','Job','ParentID']]
提供以下数据帧:
|--------+----------------------+---+--------+--------+----------------+----------------+----------|
| ID | name | S | dob | DoD | Place of birth | Job | ParentID |
|--------+----------------------+---+--------+--------+----------------+----------------+----------|
| PaAn87 | Paolo Andolini | M | 1887.0 | 1901.0 | NaN | NaN | SiAn |
| PaAn87 | Paolo Andolini | M | 1887.0 | 1901.0 | NaN | NaN | AnAn |
| ViCo92 | Vito Corleone | M | 1892.0 | 1954.0 | Corleone | godfather | SiAn |
| ViCo92 | Vito Corleone | M | 1892.0 | 1954.0 | Corleone | godfather | AnAn |
| ToHa10 | Tom Hagen | M | 1910.0 | 1970.0 | New York | Consigliere | CaCo97 |
| ToHa10 | Tom Hagen | M | 1910.0 | 1970.0 | New York | Consigliere | ViCo92 |
| SaCo16 | Santino Corleone | M | 1916.0 | 1948.0 | New York | gangster | CaCo97 |
| SaCo16 | Santino Corleone | M | 1916.0 | 1948.0 | New York | gangster | ViCo92 |
| FrCo19 | Frederico Corleone | M | 1919.0 | 1959.0 | New York | Casino Manager | CaCo97 |
| FrCo19 | Frederico Corleone | M | 1919.0 | 1959.0 | New York | Casino Manager | ViCo92 |
| MiCo20 | Michael Corleone | M | 1920.0 | 1997.0 | New York | godfather | CaCo97 |
| MiCo20 | Michael Corleone | M | 1920.0 | 1997.0 | New York | godfather | ViCo92 |
| FrCo37 | Francessa Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo17 |
| FrCo37 | Francessa Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo16 |
| KaCo37 | Kathryn Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo17 |
| KaCo37 | Kathryn Corleone | F | 1937.0 | NaN | NaN | NaN | SaCo16 |
| FrCo40 | Frank Corleone | F | 1940.0 | NaN | NaN | NaN | SaCo17 |
| FrCo40 | Frank Corleone | F | 1940.0 | NaN | NaN | NaN | SaCo16 |
| SaCo45 | Santino Jr. Corleone | M | 1945.0 | NaN | NaN | NaN | SaCo17 |
| SaCo45 | Santino Jr. Corleone | M | 1945.0 | NaN | NaN | NaN | SaCo16 |
| FrHa | Frank Hagen | M | 1940.0 | NaN | NaN | NaN | Th20 |
| FrHa | Frank Hagen | M | 1940.0 | NaN | NaN | NaN | ToHa10 |
| AnHa42 | Andrew Hagen | M | 1942.0 | NaN | NaN | Priest | Th20 |
| AnHa42 | Andrew Hagen | M | 1942.0 | NaN | NaN | Priest | ToHa10 |
| ViMa | vincent Mancini | M | 1948.0 | NaN | New York | Godfather | LuMa23 |
| ViMa | vincent Mancini | M | 1948.0 | NaN | New York | Godfather | SaCo16 |
| GiHa58 | Gianna Hagen | F | 1948.0 | NaN | NaN | NaN | Th20 |
| GiHa58 | Gianna Hagen | F | 1948.0 | NaN | NaN | NaN | ToHa10 |
| AnCo51 | Anthony Corleone | M | 1951.0 | NaN | New York | Singer | KaAd24 |
| AnCo51 | Anthony Corleone | M | 1951.0 | NaN | New York | Singer | MiCo20 |
| MaCo53 | Mary Corleone | F | 1953.0 | 1979.0 | New York | Student | KaAd24 |
| MaCo53 | Mary Corleone | F | 1953.0 | 1979.0 | New York | Student | MiCo20 |
| ChHa54 | Christina Hagen | F | 1954.0 | NaN | NaN | NaN | Th20 |
| ChHa54 | Christina Hagen | F | 1954.0 | NaN | NaN | NaN | ToHa10 |
| CoCo27 | Constanzia Corleone | F | 1927.0 | NaN | New York | rentier | CaCo97 |
| CoCo27 | Constanzia Corleone | F | 1927.0 | NaN | New York | rentier | ViCo92 |
| ViRi49 | Victor Rizzi | M | 1949.0 | NaN | New York | NaN | CoCo27 |
| ViRi49 | Victor Rizzi | M | 1949.0 | NaN | New York | NaN | CaRi20 |
| MiRi | Michael Rizzi | M | 1955.0 | NaN | NaN | NaN | CoCo27 |
| MiRi | Michael Rizzi | M | 1955.0 | NaN | NaN | NaN | CaRi20 |
|--------+----------------------+---+--------+--------+----------------+----------------+----------|
然后,我使用 graphviz 生成 DAG:
from graphviz import Digraph
f = Digraph('neato',format='pdf',encoding='utf8',filename='corleone',node_attr={'color': 'lightblue2','style': 'filled'})
f.attr('node',shape='Box')
for index,row in df.iterrows():
f.edge(str(row["ParentID"]),str(row["ID"]),label='')
f.view()
我面临的问题是有很多方面我想修改,例如:
我不知道是否可以使用 graphviz 做到这一点(无法在文档中找到方法),如果不能,我会对如何实现这一点的想法感兴趣。
解决方法
我的意思是:
f = Digraph('neato',format='pdf',encoding='utf8',filename='corleone',node_attr={'color': 'lightblue2','style': 'filled'})
f.attr('node',shape='box')
# create all the possible nodes first
# you can modify the `label`
for index,row in el.iterrows():
f.node(row['ID'],label=row['First name'] + ' '+ row['Last name'],_attributes={'color':'red' if row['S']=='M' else 'lightblue2'}
)
for index,row in df.iterrows():
f.edge(str(row["ParentID"]),str(row["ID"]),label='')
f.view()
我能够得到这样的东西。您可以对其进行更多修改:
,我改进了绘图,但仍然没有达到我的期望。所以这里是一些修改注释的代码。
- 空白单元格空白而不是
timerfd_create
:NaN
- 用特定字符串替换
keep_default_na=False
中的每个空格:ParentID
el.replace('',np.nan,regex=True,inplace = True)
t = pd.DataFrame({'tmp':['no_entry'+str(i) for i in range(el.shape[0])]})
el['ParentID'].fillna(t['tmp'],inplace=True)
- 将具有相同起始和结束节点且具有方形边的边分组
import pandas as pd import numpy as np rawdf = pd.read_csv('corleone.csv',keep_default_na=False) el1 = rawdf[['ID','MID']] el2 = rawdf[['ID','FID']] el1.columns = ['Child','ParentID'] el2.columns = el1.columns el = pd.concat([el1,el2]) el.replace('',inplace = True) t = pd.DataFrame({'tmp':['no_entry'+str(i) for i in range(el.shape[0])]}) el['ParentID'].fillna(t['tmp'],inplace=True) df = el.merge(rawdf,left_index=True,right_index=True,how='left') df['name'] = df[df.columns[4:6]].apply(lambda x: ' '.join(x.dropna().astype(str)),axis=1) df = df.drop(['Child','FID','MID','First name','Last name'],axis=1) df = df[['ID','name','S','DoB','DoD','Place of birth','Job','ParentID']]
- 具有显示
graph_attr={"concentrate": "true","splines":"ortho"})
、name
、job
、DoB
、Place of birth
的节点-
DoD
...
-
- 根据性别定义节点颜色
label=
_attributes={'color':'lightpink' if row['S']=='F' else 'lightblue'if row['S']=='M' else 'lightgray'}
哪个好得多。尽管如此,仍然存在两个主要缺陷:
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。