如何将多个相同类型的 pdata 身份添加到一个样本中，并在绘图时能够通过这些身份进行区分

如何解决如何将多个相同类型的 pdata 身份添加到一个样本中，并在绘图时能够通过这些身份进行区分

我正在尝试将表型信息 (pdata) 添加到此数据集中的样本中。

library(BiocManager)
library(GEOquery) 
library(plyr)
library(dplyr) 
library(Matrix) 
library(Seurat) 

# Loading Raw Data into RStudio ---------------------------------- 
    
    filePaths = getGEOSuppFiles("GSE118389") 
    tarF <- list.files(path = "./GSE118389/",pattern = "*.tar",full.names = TRUE) 
    untar(tarF,exdir = "./GSE118389/") 
    gzipF <- list.files(path = "./GSE118389/",pattern = "*.gz",full.names = TRUE) 
    ldply(.data = gzipF,.fun = gunzip) 
    
    list.files(path = "./GSE118389/",full.names = TRUE)

我以如下方式加载样本信息，例如患者 39 (PT039)：

# Load in full matrix ----------------------------------------------------------

fullmat <- read.table(file = './GSE118389//GSE118389_counts_rsem.txt',sep = '\t',header = TRUE,stringsAsFactors = FALSE)


# Load in PT039 matrix -----------------------------------------------------------
PT039mat <- grep(pattern =c("^PT039"),x = colnames(fullmat),value = TRUE)
PT039mat = fullmat[,grepl(c("^PT039"),colnames(fullmat))]
PT039mat[1:5,1:5]

然后我按以下方式添加关联的 pdata，本质上创建了一个单独的数据框，除了 PT039mat，稍后将作为 pdata 添加到 Seurat 对象中：

PT039pdat <- data.frame("samples" = colnames(PT039mat),"lymphovascular_invasion" = "no","nodal_involvement" = "no","BRCA_status" = "BRCA-")

这将创建以下内容：

我可能想错了，但我有以下信息，PT039 存在于多个批次中。我想基本上将这个确切的数据帧重新创建到 PT039pdat 中：

Batch   Patient ID
B1        PT039 
B2        PT058
B3        PT039,PT081,PT089
B4        PT081
B5        PT084,PT089
B6        PT084,PT089
B7        PT126
B8        PT039,PT084
B9        PT039

我可以使用以下代码做到这一点：

PT039_batch <- list(c("B1","B3","B8","B9"))
PT039pdat$batch <- PT039_batch

但是当我尝试在 UMAP 上绘制不同的批次时，如果有意义的话，它无法检测到单独的可能性。

DimPlot(sobj,reduction = "umap",split.by = "batch")

Error in `[<-.data.frame`(`*tmP*`,split.by,value = c(PT039_P11_A01_S1 = "B1",: 
  replacement has 3538 rows,data has 1325

我希望能够创建这个，但是批次信息被 B1、B2、B3 等分割。

抱歉问了这么长的问题，但感谢您的阅读！

解决方法

见https://github.com/satijalab/seurat/issues/4650。如果我想按照我的方式绘制它，看起来我需要每个单元格的批处理信息。