我想知道最好的方法。
使用此格式..
gene Sample1 Sample2 Sample3 ....
A 0 2 0
B 1 1 3
C 1.32 3.21 3.33
....
到那个结果
gene Sample1 Sample2 Sample3 ....
A-B -1 1 -3
A-C -0.32 -2.21 -0.33
A-D
...
B-C
...
给我一个建议。谢谢!
数据太大但我想快速处理它。
答案 0 :(得分:1)
使用expand.grid()
非常适合获取包含不同向量的所有组合的数据帧。在这种情况下,您希望将矢量的每个组合与其自身相结合。此代码执行您想要执行的操作,但可能有更快的方法。对于N = 1,000,我的机器需要7秒钟。
set.seed(1)
N <- 5
d <- data.frame(gene = 1:N, sample.1 = sample(N), sample.2 = sample(N))
head(d)
gene sample.1 sample.2
1 1 2 5
2 2 5 4
3 3 4 2
4 4 3 3
5 5 1 1
df <- expand.grid(list(d$gene, d$gene))
df <- merge(df, d, by.x = "Var1", by.y = "gene")
df <- merge(df, d, by.x = "Var2", by.y = "gene")
df$gene.diff <- paste(df$Var1, "-", df$Var2)
df$sample.1.diff <- df$sample.1.x - df$sample.1.y
df$sample.2.diff <- df$sample.2.x - df$sample.2.y
# Only need one difference between each pair of genes:
df <- df[df$Var1 > df$Var2, ]
df <- df[, names(df) %in% c("gene.diff", grep("diff", names(df), value = TRUE))]
head(df, n = 8)
gene.diff sample.1.diff sample.2.diff
2 2 - 1 3 -1
3 3 - 1 2 -3
4 4 - 1 1 -2
5 5 - 1 -1 -4
8 3 - 2 -1 -2
9 4 - 2 -2 -1
10 5 - 2 -4 -3
14 4 - 3 -1 1
答案 1 :(得分:1)
此解决方案消除了merge
步骤,更好地利用了矩阵运算。
#reproducible example!
data <- data.frame(gene=LETTERS[1:3], Sample1=c(0,1,1.321),
Sample2 = c(2,1,3.21), Sample3=c(0,3,3.33))
# hooray for cartesian join
combos <- subset(merge(data,data,by=NULL, suffixes=c(".1",".2")), gene.1 != gene.2)
gene1_vals <- combos[,2:ncol(data)]
gene2_vals <- combos[,(ncol(data)+2):(2*ncol(data))]
gene_diff_txt <- paste(combos[,1], combos[,ncol(data)+1],sep="-")
gene_diffs <- data.frame(gene1_vals - gene2_vals)
names(gene_diffs) <- paste0("Sample",1:ncol(gene1_vals))
data.frame(gene=gene_diff_txt, gene_diffs)[order(combos$gene.1, combos$gene.2),]
# gene Sample1 Sample2 Sample3
# 4 A-B -1.000 1.00 -3.00
# 7 A-C -1.321 -1.21 -3.33
# 2 B-A 1.000 -1.00 3.00
# 8 B-C -0.321 -2.21 -0.33
# 3 C-A 1.321 1.21 3.33
# 6 C-B 0.321 2.21 0.33