我想创建一个向量来标识具有匹配标识符的行。例如,hldid
是家庭标识符,persid
是个人标识符,partner id
是匹配的标识符。
我想在couple
有一个persid
的情况下创建向量partner
。
hldid persid partner_id age sex child
1 243312 2 91 20 2 0
2 243312 91 2 29 1 0
3 103340 0 0 6 1 2
4 103340 2 91 39 2 2
5 103340 4 0 14 2 2
6 103340 91 2 42 1 2
7 1105347 2 0 25 2 2
8 1105347 3 3 50 2 2
9 1105347 91 0 25 1 2
10 110322323 3 0 15 2 1
11 110322323 10 0 15 2 1
这会给
hldid persid partner_id age sex child couple
1 243312 2 91 20 2 0 1
2 243312 91 2 29 1 0 1
3 103340 0 0 6 1 2 0
4 103340 2 91 39 2 2 1
5 103340 4 0 14 2 2 0
6 103340 91 2 42 1 2 1
7 1105347 2 0 25 2 2 0
8 1105347 3 3 50 2 2 0
9 1105347 91 0 25 1 2 0
10 110322323 3 0 15 2 1 0
11 110322323 10 0 15 2 1 0
我创建了一个loop
,例如
df$couple = 0
for(i in 1:nrow(df)){
if(
df$hldid[i] == df$hldid[i+1] &
(df$persid[i] == df$partner_id[i+1])
)
{
df$couple[i] = 1
df$couple[i+1] = 1
}
}
}
但是,当标识符彼此不相邻时,它不能正常工作。
df = structure(list(hldid = c(243312L, 243312L, 103340L, 103340L,
103340L, 103340L, 1105347L, 1105347L, 1105347L, 110322323L, 110322323L
), persid = c(2L, 91L, 0L, 2L, 4L, 91L, 2L, 3L, 91L, 3L, 10L),
partner_id = c(91, 2, 0, 91, 0, 2, 0, 3, 0, 0, 0), age = c(20L,
29L, 6L, 39L, 14L, 42L, 25L, 50L, 25L, 15L, 15L), sex = c(2L,
1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), child = c(0, 0,
2, 2, 2, 2, 2, 2, 2, 1, 1)), class = "data.frame", row.names = c(NA,
-11L), .internal.selfref = <pointer: 0x10280b2e0>)
答案 0 :(得分:1)
我认为问题是您没有考虑CurrSheet = ActiveSheet.Name
Set JohnWB = Workbooks.Open(Filename:=NamePath & "John\John Monthly Stats.xlsx")
确切含义的逻辑。查看您期望的结果,在我看来,couple
不为0且不等于partner_id
的任何行都得到1,所有其他行都得到0。这是一个简单的条件,并且是易于实现:
persid
答案 1 :(得分:0)
根据@Gregor的评论,我不能完全确定我是否遵循需要连续排成一行的合作伙伴的逻辑。在我看来,这似乎可以通过联接来解决:
library(data.table)
setDT(df)
merge(df, df, by = )
df[df,
.(i.hldid, i.persid, i.partner_id, i.age, i.sex, i.child,
couple = ifelse(is.na(child) | i.partner_id == 0 | i.partner_id == i.persid, 0, 1)),
on = c("hldid", "persid==partner_id")]
i.hldid i.persid i.partner_id i.age i.sex i.child couple
1: 243312 2 91 20 2 0 1
2: 243312 91 2 29 1 0 1
3: 103340 0 0 6 1 2 0
4: 103340 2 91 39 2 2 1
5: 103340 4 0 14 2 2 0
6: 103340 91 2 42 1 2 1
7: 1105347 2 0 25 2 2 0
8: 1105347 3 3 50 2 2 0
9: 1105347 91 0 25 1 2 0
10: 110322323 3 0 15 2 1 0
11: 110322323 10 0 15 2 1 0
答案 2 :(得分:0)
以普通基数R,
tmp <- df1[,c("persid", "partner_id")]
tmp2 <- t(apply(tmp, 1, sort))
tmp2 <- unique( tmp2[duplicated(tmp2),] )
df1$couple <-
as.integer(apply( tmp, 1, function(x) { all(x %in% tmp2)}))
# hldid persid partner_id age sex child couple
#1 243312 2 91 20 2 0 1
#2 243312 91 2 29 1 0 1
#3 103340 0 0 6 1 2 0
#4 103340 2 91 39 2 2 1
#5 103340 4 0 14 2 2 0
#6 103340 91 2 42 1 2 1
#7 1105347 2 0 25 2 2 0
#8 1105347 3 3 50 2 2 0
#9 1105347 91 0 25 1 2 0
#10 110322323 3 0 15 2 1 0
#11 110322323 10 0 15 2 1 0