我有这个矩阵,一个GO列术语,一个为该术语富集的列基因和该基因的折叠基因
GO_term Gene_Name Log2FC
cell adhesion IGFBP7 1.38
cell adhesion PVRL4 -1.40
cell adhesion NCAM1 -1.35
cell-matrix adhesion ITGA7 -1.20
cell-matrix adhesion ITGA4 0.75
positive regulation of cell migration ITGA5 -1.36
positive regulation of cell migration RRAS2 -0.59
cellular oxidant detoxification FABP1 2.35
cellular oxidant detoxification LTC4S -0.59
muscle contraction ACTA2 -1.21
muscle contraction VCL -1.06
如何将矩阵转换为类似的内容
> head(chord)
cell adhesion cell-matrix adhesion positive regulation of cell migration cellular oxidant detoxification
PTK2 0 1 1
GNA13 0 0 1
LEPR 0 0 1
APOE 0 0 1
CXCR4 0 0 1
RECK 0 0 1
muscle contraction logFC
PTK2 1 -0.6527904
GNA13 1 0.3711599
LEPR 1 2.6539788
APOE 1 0.8698346
CXCR4 1 -2.5647537
RECK 1 3.6926860
>
每个GO项中具有相应logfFC的基因的二进制矩阵
答案 0 :(得分:1)
这里有一些数据
df = data.frame(
row = sample(letters), col = sample(letters),
stringsAsFactors = FALSE
)
构造一个具有适当尺寸和暗号的矩阵
nrow = length(unique(df$row))
ncol = length(unique(df$col))
m = matrix(0, nrow, ncol, dimnames=list(unique(df$row), unique(df$col)))
并利用两列矩阵的矩阵子集将两列矩阵用作行/列索引来更新值的事实
m[as.matrix(df)] = 1
尚不清楚您要使用log FC做什么,因为每行可能有多个,并且您还没有描述希望对其进行汇总的方式。
答案 1 :(得分:0)
假设您有这样的数据文件gene.txt
GO_term,Gene_Name,Log2FC
cell adhesion,IGFBP7,1.38
cell adhesion,PVRL4,-1.40
cell adhesion,NCAM1,-1.35
cell-matrix adhesion,ITGA7,-1.20
cell-matrix adhesion,ITGA4,0.75
positive regulation of cell migration,ITGA5,-1.36
positive regulation of cell migration,RRAS2,-0.59
cellular oxidant detoxification,FABP1,2.35
cellular oxidant detoxification,LTC4S,-0.59
muscle contraction,ACTA2,-1.21
muscle contraction,VCL,-1.06
gene = read.csv("gene.txt")
golevels = levels(gene$GO_term)
genelevels = levels(gene$Gene_Name)
ndf = data.frame(Gene_Name=genelevels)
for (g in golevels){
ndf[[g]] = 0
}
ndf$Log2FC = 0
index = 1
nc = ncol(ndf)
for (gg in genelevels){
temp = as.integer(golevels %in% gene[gene$Gene_Name == gg,"GO_term"])
ndf[index, -c(1,nc)] = temp
# assuming each type of Gene_Name has unique Log2FC value
ndf[index, "Log2FC"] = gene[gene$Gene_Name == gg, "Log2FC"][1]
index = index + 1
}
# transform to matrix
ndf$Gene_Name = NULL
m = as.matrix(ndf)
row.names(m) = genelevels