我有以下表格:
x = data.table(Id=c(1,1,2,3,3,4), Name=c("A", "A", "B", "C", "C", "D"), TxId=c(10, 11, 20, 30, 31, 40))
#Id Name TxId
#1: 1 A 10
#2: 1 A 11
#3: 2 B 20
#4: 3 C 30
#5: 3 C 31
#6: 4 D 40
y = data.table(Name=c("A", "B", "B", "C"), Family=c("A-alpha", "B-beta", "B-gamma", "C-delta"))
# Name Family
#1: A A-alpha
#2: B B-beta
#3: B B-gamma
#4: C C-delta
我可以进行左连接和连接,但我只想为X中的每一行输出一行。
# Left join X to Y on Name column
xy = y[x, on="Name"]
# Name Family Id TxId
#1: A A-alpha 1 10
#2: A A-alpha 1 11
#3: B B-beta 2 20
#4: B B-gamma 2 20
#5: C C-delta 3 30
#6: C C-delta 3 31
#7: D NA 4 40
# Concatenate Family column
xy[, Family:=paste0(Family, collapse=", "), by=c("Name", "TxId")]
# Name Family Id TxId
#1: A A-alpha 1 10
#2: A A-alpha 1 11
#3: B B-beta, B-gamma 2 20
#4: B B-beta, B-gamma 2 20
#5: C C-delta 3 30
#6: C C-delta 3 31
#7: D NA 4 40
如何摆脱B的额外行?我希望它在Id / TxId上是唯一的。即。
# Name Family Id TxId
#1: A A-alpha 1 10
#2: A A-alpha 1 11
#3: B B-beta, B-gamma 2 20
#5: C C-delta 3 30
#6: C C-delta 3 31
#7: D NA 4 40
如果我做eddi评论:
xy[, .(Family=paste0(Family, collapse=", "), by=c("Name", "TxId")])
我得到了正确的结果。但是如果我尝试添加其他列,它就不起作用(我得到的结果与我完成:=
版本的结果相同):
xy[, .(Id, Family=paste0(Family, collapse=", ")), by=c("Name", "TxId")]
答案 0 :(得分:1)
请尝试
xy[, .(Family = paste0(Family, collapse = ", "), by = c("Id", "Name", "TxId")]
我尝试解释:
如果Id
是该群组的一部分,那么对于Id
的每个唯一值,它只会出现一次(确切地说,对于Id
,Name
的每个唯一组合, TxId
)。如果Id
- 表达式中包含j
,即.(Id, Family = paste0(Family, collapse = ", ")
,那么Id
的每一行都将包含在结果集中,尽管正在汇总Family