微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

比较两个基因组范围 (R)

如何解决比较两个基因组范围 (R)

我有 2 个基因组范围

g1<-GRanges(c("chr1:0-14","chr1:15-29"),score=c(20.2,10.4));g1

GRanges object with 2 ranges and 1 Metadata column:
   seqnames    ranges strand |     score
      <Rle> <IRanges>  <Rle> | <numeric>
[1]     chr1      0-14      * |      20.2
[2]     chr1     15-29      * |      10.4

g2<-GRanges(c("chr1:0-9","chr1:10-19","chr1:20-29"),state=c('E1','E2','E1'));g2

GRanges object with 3 ranges and 1 Metadata column:
   seqnames    ranges strand |       state
      <Rle> <IRanges>  <Rle> | <character>
[1]     chr1       0-9      * |          E1
[2]     chr1     10-19      * |          E2
[3]     chr1     20-29      * |          E1

我想让它们具有可比性。首先我将它们组合起来,然后我使用了分离:

g3<-(c(g1,g2)); g3 

GRanges object with 5 ranges and 2 Metadata columns:
    seqnames    ranges strand |     score       state
       <Rle> <IRanges>  <Rle> | <numeric> <character>
 [1]     chr1      0-14      * |      20.2        <NA>
 [2]     chr1     15-29      * |      10.4        <NA>
 [3]     chr1       0-9      * |      <NA>          E1
 [4]     chr1     10-19      * |      <NA>          E2
 [5]     chr1     20-29      * |      <NA>          E1

disjoin(g3)
                                                                                                   
 GRanges object with 4 ranges and 0 Metadata columns:
   seqnames    ranges strand
      <Rle> <IRanges>  <Rle>
[1]     chr1       0-9      *
[2]     chr1     10-14      *
[3]     chr1     15-19      *
[4]     chr1     20-29      *

所以,disjoin 正在执行我想要的拆分,但不幸的是没有保留元数据。有没有办法像这样保留元数据并获得GRanges?

 GRanges object with 5 ranges and 2 Metadata columns:
   seqnames    ranges strand |     score       state
      <Rle> <IRanges>  <Rle> | <numeric> <character>
[1]     chr1       0-9      *| 20.2    E1
[2]     chr1     10-14      *| 20.2   E2
[3]     chr1     15-19      *| 10.4   E2
[4]     chr1     20-29      *| 10.4   E1

谢谢

解决方法

我想你会在这里找到帮助:https://support.bioconductor.org/p/82551/ 但请注意,在您的情况下,它并不准确,因为输出中的一个范围可以映射到输入中的多个范围

,

是的,with.revmap=T 绝对是解决方案:

g1<-GRanges(c("chr1:0-14","chr1:15-29"),score=c(20.2,10.4));g1
g2<-GRanges(c("chr1:0-9","chr1:10-19","chr1:20-29"),state=c('E1','E2','E1'));g2
g3<-(c(g1,g2)); g3 #combining GRanges
g4<-disjoin(g3,with.revmap=TRUE);g4 #disjoining to compare them WITH revmap
l1<-g4$revmap;l1 
score<-extractList(mcols(g3)$score,l1);score 
state<-extractList(mcols(g3)$state,l1);state
na.omit<-function(l){sapply(l,function(x){x[!is.na(x)]})} #remove NA's
mcols(g4)$score<-na.omit(score)
mcols(g4)$state<-na.omit(state)
g4

GRanges object with 4 ranges and 3 metadata columns:
   seqnames    ranges strand |        revmap     score       state
      <Rle> <IRanges>  <Rle> | <IntegerList> <numeric> <character>
[1]     chr1       0-9      * |           1,3      20.2          E1
[2]     chr1     10-14      * |           1,4      20.2          E2
[3]     chr1     15-19      * |           2,4      10.4          E2
[4]     chr1     20-29      * |           2,5      10.4          E1

现在我可以轻松地将状态与其分数进行比较,例如进行箱线图。 谢谢巴斯蒂安

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。