我有一个大型数据集,我想用其他人使用lapply函数对1列进行求和。但我有问题,其他专栏消失了......我想保留它们。
我有一个例子:)
例如:
我有数据集:
X Y Z date columnSum
1: A a1 z1 2018.01 4
2: A a1 z1 2018.01 4
2: B a2 z3 2018.02 10
2: B a2 z5 2018.02 30
2: B a2 z5 2018.02 10
3: C a2 z3 2018.02 10
4: D a3 z4 2018.03 0
4: D a3 z6 2018.03 0
我要总结" columnSum " by" X "," Y "和" 日期"。我想保留专栏" Z "
我试过了:
DT[, lapply(.SD,sum,na.rm=TRUE), .SDcols="columnSum", by=list(X,Y,date)]
今天我有结果:
X Y date columnSum
1: A a1 2018.01 8
2: B a2 2018.02 50
3: C a2 2018.02 10
4: D a3 2018.03 0
我想要结果:
X Y Z date columnSum
1: A a1 z1 2018.01 8
2: B a2 z3 2018.02 50
3: B a2 z5 2018.02 50
4: C a2 z3 2018.02 10
5: D a3 z4 2018.03 0
6: D a3 z6 2018.03 0
答案 0 :(得分:1)
df <- read.table(text = "X Y Z date columnSum
A a1 z1 2018.01 4
A a1 z1 2018.01 4
B a2 z3 2018.02 10
B a2 z5 2018.02 30
B a2 z5 2018.02 10
C a2 z3 2018.02 10
D a3 z4 2018.03 0
D a3 z6 2018.03 0",
header = TRUE, stringsAsFactors = FALSE)
library(data.table)
setDT(df)
df[, columnSum := sum(columnSum), by = c("X", "Y", "date")] # summing columnSum by X, Y, date and retaining column Z
df <- unique(df) # filtering duplicate records
# X Y Z date columnSum
# 1: A a1 z1 2018.01 8
# 2: B a2 z3 2018.02 50
# 3: B a2 z5 2018.02 50
# 4: C a2 z3 2018.02 10
# 5: D a3 z4 2018.03 0
# 6: D a3 z6 2018.03 0
答案 1 :(得分:0)
这有效:
# recreated your example
DT <- data.table(X = c("A", "A", "B", "B", "B", "C", "D", "D"),
Y = c("a1", "a1", "a2", "a2", "a2", "a2", "a3", "a3"),
Z = c("z1", "z1", "z3", "z5", "z5", "z3", "z4", "z6"),
date = c("2018.01", "2018.01", "2018.02", "2018.02",
"2018.02", "2018.02", "2018.03", "2018.03"),
columnSum = c(4, 4, 10, 30, 10, 10, 0, 0))
sums <- DT[, sum(columnSum), .(X, Y, date)]
keep <- unique(DT[, .(X, Y, Z, date)])
merge.data.frame(keep, sums)
答案 2 :(得分:0)
library(dplyr)
practiceData <- tibble(X=c("A","A","B","B","B","C","D","D"),
Y=c('a1', 'a1', 'a2','a2', 'a2', 'a2','a3','a3'),
Z=c('z1','z1', 'z3', 'z5','z5','z3','z4','z6'),
date=c('2018.01',
'2018.01',
'2018.02',
'2018.02',
'2018.02',
'2018.02',
'2018.03',
'2018.03'),
U=c(4,4,10,30,10,10,0,0))
NEW <- practiceData %>% group_by(X,Y,date) %>% summarise(colsumnew=sum(U))
lol <- unique(practiceData[,c(1,2,3,4)]) %>% data.frame()
lol2 <- left_join(NEW,lol) %>% unique()