我有一个这样的数据框:
ID Exp1 Exp2 Value1
AAA 5 6 7
AAA 4 8 8
BBB 3 5 9
BBB 6 7 4
CCC 2 5 6
....
我希望在每次重复ID后创建一个新行,并对以前的结果进行求和,如下所示:
ID Exp1 Exp2 Value1
AAA 5 6 7
AAA 4 8 8
AAA.1 9 14 15
BBB 3 5 9
BBB 6 7 4
BBB.1 9 12 13
CCC 2 5 6
...
我的问题是我无法编写代码以在相同ID之后插入新行。
> for (i in 1:nrow(Data)) {
> temp1 <- Data[Data$ID == Data$ID[i],]
但不知道如何继续...... 有什么想法吗?
更新: 原始数据如何..
GeneNames Original ID2 Com. Ratio Cyt Nuc
YWHAB CL84Contig6 1433B_HUMAN -0.2 0.6 1063.3 671.3
YWHAB CL84Contig4 1433B_HUMAN -0.3 0.5 59.0 30.5
YWHAE CL1665Contig1 1433E_HUMAN -0.3 0.5 2784.6 1490.1
YWHAE CL1665Contig4 1433E_HUMAN 0.1 1.2 2.1 4.8
YWHAH dsrrswapns 1433F_HUMAN 0.0 0.0 0.0 0.0
YWHAG CL2762Contig2 1433G_HUMAN -0.3 0.4 39.5 17.7
YWHAG CL2762Contig3 1433G_HUMAN 0.0 0.0 0.0 0.0
我想怎么做......
GeneNames Original ID2 Com. Ratio Cyt Nuc
YWHAB CL84Contig6 1433B_HUMAN -0.2 0.6 1063.3 671.3
YWHAB CL84Contig4 1433B_HUMAN -0.3 0.5 59.0 30.5
YWHAB.1 CL84Contig6 1433B_HUMAN -0.2 0.6 1122.4 701.8
YWHAE CL1665Contig1 1433E_HUMAN -0.3 0.5 2784.6 1490.1
YWHAE CL1665Contig4 1433E_HUMAN 0.1 1.2 2.1 4.8
YWHAE.1 CL1665Contig1 1433E_HUMAN -0.3 0.5 2786.6 1494.9
我有一个data.frame:13044 obs。 94个变量:这94个变量是num和chr列。 我想仅从同一个GeneNames的Cyt和Nuc中总结出值,并将它们写入GeneName名为“GeneName.1”的新行。每个GeneName的其余列都不相同。我宁愿将它们留空或复制相同GeneName的第一列,如示例中所示..
答案 0 :(得分:4)
您可以使用data.table
执行此操作。转换&#34; data.frame&#34;到&#34; data.table&#34; (setDT
)。创建一个&#34; NA&#34;行(.SD[1:(.N+1)]
)按&#34; ID&#34;分组,替换&#34; NA&#34;每个&#34; ID&#34; sum
(lapply(.SD,...)
)
library(data.table)
setDT(df1)[, .SD[1:(.N+1)], ID][, lapply(.SD, function(x)
replace(x, is.na(x), sum(x, na.rm=TRUE))) , ID]
# ID Exp1 Exp2 Value1
#1: AAA 5 6 7
#2: AAA 4 8 8
#3: AAA 9 14 15
#4: BBB 3 5 9
#5: BBB 6 7 4
#6: BBB 9 12 13
#7: CCC 2 5 6
#8: CCC 2 5 6
或者您可以使用&#34;总和&#34; rbind
列。通过&#34; ID&#34;组。这是由&#34; ID&#34;
setDT(df1)[, rbind(.SD,lapply(.SD, sum)), ID]
# ID Exp1 Exp2 Value1
#1: AAA 5 6 7
#2: AAA 4 8 8
#3: AAA 9 14 15
#4: BBB 3 5 9
#5: BBB 6 7 4
#6: BBB 9 12 13
#7: CCC 2 5 6
#8: CCC 2 5 6
根据新数据集,尝试
DT1 <- setDT(df1)[, .SD[1:(.N+1)], GeneNames][, 6:7 := lapply(.SD,
function(x) replace(x, is.na(x), sum(x, na.rm=TRUE))),
GeneNames, .SDcols=6:7]
DT1[, 2:5 := lapply(.SD, function(x) replace(x, is.na(x),
x[1L])), GeneNames, .SDcols=2:5][]
# GeneNames Original ID2 Com. Ratio Cyt Nuc
#1: YWHAB CL84Contig6 1433B_HUMAN -0.2 0.6 1063.3 671.3
#2: YWHAB CL84Contig4 1433B_HUMAN -0.3 0.5 59.0 30.5
#3: YWHAB CL84Contig6 1433B_HUMAN -0.2 0.6 1122.3 701.8
#4: YWHAE CL1665Contig1 1433E_HUMAN -0.3 0.5 2784.6 1490.1
#5: YWHAE CL1665Contig4 1433E_HUMAN 0.1 1.2 2.1 4.8
#6: YWHAE CL1665Contig1 1433E_HUMAN -0.3 0.5 2786.7 1494.9
#7: YWHAH dsrrswapns 1433F_HUMAN 0.0 0.0 0.0 0.0
#8: YWHAH dsrrswapns 1433F_HUMAN 0.0 0.0 0.0 0.0
#9: YWHAG CL2762Contig2 1433G_HUMAN -0.3 0.4 39.5 17.7
#10: YWHAG CL2762Contig3 1433G_HUMAN 0.0 0.0 0.0 0.0
#11: YWHAG CL2762Contig2 1433G_HUMAN -0.3 0.4 39.5 17.7
或使用rbind
方法
DT1 <- setDT(df1)[, rbind(.SD, lapply(.SD, sum)), GeneNames, .SDcols=6:7]
setkey(df2, GeneNames, Cyt, Nuc)[DT1]
然后将第2列:5中的NA更改为之前的第一行值
df1 <- structure(list(ID = c("AAA", "AAA", "BBB", "BBB", "CCC"),
Exp1 = c(5L, 4L, 3L, 6L, 2L), Exp2 = c(6L, 8L, 5L, 7L, 5L), Value1 =
c(7L, 8L, 9L, 4L, 6L)), .Names = c("ID", "Exp1", "Exp2", "Value1"),
class = "data.frame", row.names = c(NA, -5L))