微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Plotly-igraph 绘制具有可变数量子节点的树?

如何解决Plotly-igraph 绘制具有可变数量子节点的树?

我希望生成一个可视化 xml 文件结构的图表。

我创建了一个节点列表来表示 xml 文件
每个节点包含 3 个字符串:xml 标签属性内容

xml 文件如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<entry db="genbank">
   <data id="AC116785" length="132912" molecule="DNA" data_class="linear" division="HTG" date="08-JUL-2002" />
   <deFinition>
      <description>Mus musculus clone RP24-146B1,WORKING DRAFT SEQUENCE,10 ordered pieces.</description>
   </deFinition>
   <accession>AC116785</accession>
   <version>
      <version_number>AC116785.3</version_number>
      <gi>21703640</gi>
   </version>
   <keywords>
      <keyword>HTG</keyword>
      <keyword>HTGS_PHASE2</keyword>
      <keyword>HTGS_DRAFT</keyword>
      <keyword>HTGS_FULLTOP</keyword>
   </keywords>
   <source>
      <abbreviation>house mouse.</abbreviation>
      <organism>
         <name>Mus musculus</name>
         <taxonomy>
            <class>Eukaryota</class>
            <class>Metazoa</class>
            <class>Chordata</class>
            <class>Craniata</class>
            <class>Vertebrata</class>
            <class>Euteleostomi</class>
            <class>Mammalia</class>
            <class>Eutheria</class>
            <class>Rodentia</class>
            <class>Sciurognathi</class>
            <class>Muridae</class>
            <class>Murinae</class>
            <class>Mus</class>
         </taxonomy>
      </organism>
   </source>
   <references>
      <reference number="1" from="1" to="132912">
         <authors>
            <author>Birren,B.</author>
         </authors>
         <title>Mus musculus,clone RP24-146B1</title>
         <journal>
            <location>Unpublished</location>
         </journal>
      </reference>
      <reference number="2" from="1" to="132912">
         <authors>
            <author>Birren,B.</author>
         </authors>
         <title>Direct Submission</title>
         <journal>
            <submission>02-APR-2002</submission>
            <department>Whitehead Institute/MIT Center for Genome Research,320 Charles Street,Cambridge,MA 02141,USA</department>
         </journal>
      </reference>
      <reference number="3" from="1" to="132912">
         <authors>
            <author>Birren,B.</author>
         </authors>
         <title>Direct Submission</title>
         <journal>
            <submission>08-JUL-2002</submission>
            <department>Whitehead Institute/MIT Center for Genome Research,USA</department>
         </journal>
      </reference>
   </references>
   <comment>
      <replaced>
         <date>Jul 8,2002</date>
         <gi>21700645</gi>
      </replaced>
      <information title="All repeats were identified using RepeatMasker">Smit,A.F.A.,Green,P. (1996-1997)http://ftp.genome.washington.edu/RM/RepeatMasker.html</information>
      <information title="Center">Whitehead Institute/ MIT Center for Genome Research</information>
      <information title="Center code">WIBR</information>
      <information title="Web site">http://www-seq.wi.mit.edu</information>
      <information title="Contact">sequence_submissions@genome.wi.mit.edu</information>
      <information title="Center project name">L25104</information>
      <information title="Center clone name">146_B_1</information>
      <information title="Sequencing vector">Plasmid; n/a; 100% of reads</information>
      <information title="Chemistry">Dye-terminator Big Dye; 100% of reads</information>
      <information title="Assembly program">Phrap; version 0.960731</information>
      <information title="Consensus quality">130058 bases at least Q40</information>
      <information title="Consensus quality">131186 bases at least Q30</information>
      <information title="Consensus quality">131595 bases at least Q20</information>
      <information title="Insert size">142000; agarose-fp</information>
      <information title="Insert size">132012; sum-of-contigs</information>
      <information title="Quality coverage">6.9 in Q20 bases; agarose-fp</information>
      <information title="Quality coverage">7.5 in Q20 bases; sum-of-contigs</information>
      <information title="NOTE">This is a 'working draft' sequence. It currently consists of 10 contigs. Gaps between the contigsare represented as runs of N. The order of the piecesis believed to be correct as given,however the sizesof the gaps between them are based on estimates that haveprovided by the submittor.This sequence will be replacedby the finished sequence as soon as it is available andthe accession number will be preserved.</information>
      <information title="1     1178">contig of 1178 bp in length</information>
      <information title="1179 1278">gap of      100 bp</information>
      <information title="1279     2835">contig of 1557 bp in length</information>
      <information title="2836 2935">gap of      100 bp</information>
      <information title="2936     5385">contig of 2450 bp in length</information>
      <information title="5386 5485">gap of      100 bp</information>
      <information title="5486     8192">contig of 2707 bp in length</information>
      <information title="8193 8292">gap of      100 bp</information>
      <information title="8293    10488">contig of 2196 bp in length</information>
      <information title="10489 10588">gap of      100 bp</information>
      <information title="10589    12801">contig of 2213 bp in length</information>
      <information title="12802 12901">gap of      100 bp</information>
      <information title="12902    18716">contig of 5815 bp in length</information>
      <information title="18717 18816">gap of      100 bp</information>
      <information title="18817    34793">contig of 15977 bp in length</information>
      <information title="34794 34893">gap of      100 bp</information>
      <information title="34894    51004">contig of 16111 bp in length</information>
      <information title="51005 51104">gap of      100 bp</information>
      <information title="51105   132912">contig of 81808 bp in length.</information>
   </comment>
   <features>
      <sequence_feature type="source">
         <location>1..132912</location>
         <qualifer type="db_xref">taxon:10090</qualifer>
         <qualifer type="clone">RP24-146B1</qualifer>
         <qualifer type="clone_lib">RPCI-24 Male Mouse BAC</qualifer>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>1..1178</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>1279..2835</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>2936..5385</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>5486..8192</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>8293..10488</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>10589..12801</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>12902..18716</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>18817..34793</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>34894..51004</location>
      </sequence_feature>
      <sequence_feature type="misc_feature">
         <location>51105..132912</location>
      </sequence_feature>
   </features>
   <base_count num_a="43599" num_c="24512" num_g="23668" num_t="40195" num_others="938" />
   <sequence>mhkkiciigagaaglvsakhaikqgyqvdifeqtdqvggtwvysektgchsslykvmktn
lpkeamlfqdepfrdelpsfmshehvleylnefskdfpiqfsstvnevkrendlwkvlie
snsetitrfydvvfvcnghffeplnpyqnsyfkgklihshdyrraehytgknvvivgagp
sgiditlqiaqtanhvtliskkatypvlpesvqqmatnvksvdehgvvtdegdhvpadvi
ivctgyvfkfpfldssliqlkyndrmvsplyehlchvdypttlffiglplgtitfplfev
qvkyalsliagkgklpsddveirnfedarlqgllnpasfhviieeqweymkklakmggfe
ewnymetikklygyimterkknvigykmvnfelttdssdfklltirvdfnddvawiirfa
ypi</sequence>
</entry>

我希望通过枚举节点列表使用 Plotly 和 igraph 库生成树图。

我使用此网站 here 作为参考。

我的 XML 文件包含具有可变数量子元素的元素。 但是,给出的示例仅向我展示了如何开发具有固定数量的子节点的树(示例显示每个节点有固定数量的 2 个子节点)

查看 igraph 教程网站 here,我看到了一个类似的示例,其中每个节点仅使用 2 个子节点。

我应该如何生成具有可变数量的子节点的树,例如在我的 XML 文件中?

我已经坚持了这么久,任何帮助将不胜感激!

解决方法

您可以像这样创建图表:

from lxml import etree
from igraph import Graph
   
root = etree.parse("entry.xml").getroot()
 
element_ids = {elem: i for i,elem in enumerate(root.iter())}

edges = []
for parent,parent_id in element_ids.items():
    for child in parent.getchildren():
        edges.append((parent_id,element_ids[child]))

G = Graph(edges)

element_ids 字典将包含 XML 中的所有标签作为键和所有元素的不同 id,例如 {tag1: 0,tag2: 1,tag3: 2}。这样你以后就可以找到所有标签的 id。

我不知道如何将标签放入 plotly,但对于使用 igraph 绘图,将标签名称添加为标签会很有用:

names = [e.tag for e in element_ids]
G.vs['label'] = names

我还没有尝试过,但图形可视化必须与文章中的相同。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。