我有一个data.table
对象,基本上我想做的是每当出现特定ID_Type
和BUYER/SELLER
字符值时更新数据表。举个例子,我在这里给出了data.table
:
ID_Type | BUYER | SELLER
------------------------------------------------
1 | | Joe
0 | Peter |
1 | Peter |
1 | Sam |
1 | Peter |
0 | | Mark
1 | Tai |
1 | Tai |
1 | | Mark
dput
输出如下:
structure(list(ID_Type = c("1", "0", "1", "1", "1", "0", "1",
"1", "1"), BUYER = c(" ", "Peter", "Peter", "Sam", "Peter", " ",
"Tai", "Tai", " "), SELLER = c("Joe", " ", " ", " ", " ", "Mark",
" ", " ", "Mark")), .Names = c("ID_Type", "BUYER", "SELLER"), row.names = c(NA, -9L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000009c60788>)
现在,对于特定ID_Type
或0
的{{1}}行,BUYER
,我希望确保该特定SELLER
的每个实例或数据表中的BUYER
在后续行中有SELLER
个ID_Type
。例如,0
Peter在第2行中有BUYER
ID_Type
,因此每当Peter出现在0
列的数据表中后,我想更改每个彼得的BUYER
到ID_Type
,同样的事情发生在0
马克
基本上,我想要的新数据表应该如下所示:
SELLER
答案 0 :(得分:4)
这个怎么样
library(data.table)
aaa <- structure(list(ID_Type = c("1", "0", "1", "1", "1", "0", "1", "1", "1"),
BUYER = c(" ", "Peter", "Peter", "Sam", "Peter", " ", "Tai", "Tai", " "),
SELLER = c("Joe", " ", " ", " ", " ", "Mark", " ", " ", "Mark")),
.Names = c("ID_Type", "BUYER", "SELLER"),
row.names = c(NA, -9L), class = c("data.table", "data.frame"))
aaa[BUYER != " ", ID_Type := ID_Type[1], by = BUYER]
aaa[SELLER != " ", ID_Type := ID_Type[1], by = SELLER]
aaa
# ID_Type BUYER SELLER
# 1: 1 Joe
# 2: 0 Peter
# 3: 0 Peter
# 4: 1 Sam
# 5: 0 Peter
# 6: 0 Mark
# 7: 1 Tai
# 8: 1 Tai
# 9: 0 Mark
答案 1 :(得分:1)
我会写一个小帮手功能。我还会用真正的缺失值替换你的空格字符串" "
:
dd[BUYER == " ", BUYER := NA]
dd[SELLER == " ", SELLER := NA]
foo = function(x) {
if (any(x == 0)) return(rep("0", length(x)))
return(x)
}
dd[!is.na(BUYER), ID_Type := foo(ID_Type), by = BUYER]
dd[!is.na(SELLER), ID_Type := foo(ID_Type), by = SELLER]
dd
# ID_Type BUYER SELLER
# 1: 1 NA Joe
# 2: 0 Peter NA
# 3: 0 Peter NA
# 4: 1 Sam NA
# 5: 0 Peter NA
# 6: 0 NA Mark
# 7: 1 Tai NA
# 8: 1 Tai NA
# 9: 0 NA Mark
答案 2 :(得分:0)
虽然OP接受GL_Li's answer显然返回给定样本数据集的预期结果,但我怀疑它是否正确实现了OP的要求。
OP要求(强调我的)
对于特定的
ID_Type
,行中的BUYER
0 时SELLER
,该特定BUYER
或SELLER
的每个实例 数据表在以后的行中有ID_Type
0 。
如果要严格按照上述规范来反映OP的意图那么GL_Li's answer会失败3点:
ID_Type
,尽管OP已指定仅在以后的行中更改它。ID_Type
的第一个值不 0,则忽略后续出现的0。ID_Type
将变为0或1而不是任何其他值)我在示例数据集中添加了几行来演示效果:
DT2
ID_Type BUYER SELLER 1: 1 Joe 2: 0 Peter 3: 1 Peter 4: 1 Sam 5: 1 Peter 6: 0 Mark 7: 1 Tai 8: 1 Tai 9: 1 Mark 10: 0 Tai 11: 1 Tai 12: 2 Sam 13: 3 Tom 14: 2 Tom
在DT2
DT2[BUYER != "", ID_Type := ID_Type[1], by = BUYER]
DT2[SELLER != "", ID_Type := ID_Type[1], by = SELLER]
DT2
返回
ID_Type BUYER SELLER 1: 1 Joe 2: 0 Peter 3: 0 Peter 4: 1 Sam 5: 0 Peter 6: 0 Mark 7: 1 Tai 8: 1 Tai 9: 0 Mark 10: 1 Tai 11: 1 Tai 12: 1 Sam 13: 3 Tom 14: 3 Tom
第10,11,12和14行违反了规范,恕我直言。
替代解决方案
DT2[, cnt := cumsum(ID_Type == "0"), by = .(BUYER, SELLER)][
cnt > 0L, ID_Type := "0"][, cnt := NULL]
DT2
返回
ID_Type BUYER SELLER 1: 1 Joe 2: 0 Peter 3: 0 Peter 4: 1 Sam 5: 0 Peter 6: 0 Mark 7: 1 Tai 8: 1 Tai 9: 0 Mark 10: 0 Tai 11: 0 Tai 12: 2 Sam 13: 3 Tom 14: 2 Tom
根据规范工作,因为它仅更改后续行中出现的0。
请注意,上述解决方案基于隐含的假设,即名称仅出现在BUYER
或SELLER
两列中的任意一列中,但两者中都不会出现。
library(data.table)
DT2 <- fread(
"ID_Type | BUYER | SELLER
1 | | Joe
0 | Peter |
1 | Peter |
1 | Sam |
1 | Peter |
0 | | Mark
1 | Tai |
1 | Tai |
1 | | Mark
0 | Tai |
1 | Tai |
2 | Sam |
3 | Tom |
2 | Tom |",
sep = "|", colClasses = c(ID_Type = "character"))