我有两个数据帧(df1,df2)。我想在df2中创建一个新的“分数”列,该列将遵循ifelse语句。
If Else Statement:
If pn1=pn2 & sub1=sub2, then score = 2,
elseif pn1=pn2 & sub1 IS IN sub2, then score = 1,
elseif pn=pn, then score = 0,
else score = NA.
```
pn1 <- c('12345','12345','13579', '01289','22468')
sub1 <- c('01','x001','hi-02','bye','12')
pn2 <- c('12345','12345','13579', '01289','22468','28245')
sub2 <- c('01','x002','hi-2','b','xyz','23')
row <-c(1,2,3,4,5,6)
df1 <- data.frame(pn1,sub1)
df2 <- data.frame(row,pn2,sub2)
#Desired Output
score <- c(2,1,1,1,0,'NA')
df2$score <- score
```
For further explanation on the if statement:
Row 1- Score = 2 because PN1 = PN2 and SUB1=SUB2.
Row 2,3,4- Score = 1 because PN1 = PN2 and SUB1 can be found in SUB2.
Row 5- Score = 0 because PN1=PN2.
Row 6- Score = NA because PN2 is not found in df1.
答案 0 :(得分:0)
由于两个数据框的尺寸不同,因此我无法完全理解您的问题。另外,在您的示例中,在这些情况下,第二到第四个索引不会评估为1,因为SUB1不在SUB2中。该答案基于您描述的所需内容,而不是您认为想要的内容。
df1 <- data.frame(pn1,sub1, stringsAsFactors = FALSE)
df2 <- data.frame(row,pn2,sub2, stringsAsFactors = FALSE)
library(dplyr)
df2$score <- case_when(df2$pn2 == df1$pn1 & df2$sub2 == df1$sub1 ~ 2,
df2$pn2 %in% df1$pn1 & df2$sub2 %in% df1$sub1 ~ 1,
df2$pn2 %in% df1$pn1 & !(df2$sub2 %in% df1$sub1) ~ 0,
!(df2$pn2 %in% df1$pn1) ~ NA_real_)