假设我有两个数据框
df1=data.frame(item=c(rep("a",2),rep("b",3),"c","NA",rep("d",4)),
product=paste0("prd",seq(1:11)))
df2=data.frame(item=c("b","d"), price=c(10,20))
对于df1,我需要添加一个col以指示它是否在df2项目col中以及每行中指示有多少产品,除非它是na,像这样< / p>
item product#
a 2
a 2
b 3
b 3
b 3
我应该如何重复每一行的产品计数?
用于查找我使用
df1$hasDF2=ifelse(is.na(match(df1$item,df2$item)),"N","Y")
是否有更有效的替代方案?
谢谢!
答案 0 :(得分:0)
尝试:
df1$productNo<- with(df1, ave(seq_along(product), item, FUN=length))
df1$productNo
#[1] 2 2 3 3 3 1 1 4 4 4 4
df1$hasDF2 <- c("N", "Y")[(!is.na(match(df1$item, df2$item))) +1]
df1$hasDF2
#[1] "N" "N" "Y" "Y" "Y" "N" "N" "Y" "Y" "Y" "Y"
或使用data.table
library(data.table)
setDT(df1)[,c("produtNo", "hasDF2") := list(.N, "N"),
by=item][item %in% df2$item, hasDF2:= "Y"]
对于unique
计数,你可以这样做:
#creating a dataset with duplicate products
df1 <- data.frame(item=c(rep("a",2),rep("b",3),"c","NA",rep("d",5)),
product=paste0("prd",c(1:11,11)))
setDT(df1)[,c("productNo", "hasDF2") := list(length(unique(product)), "N"),
by=item][item %in% df2$item, hasDF2:= "Y"]