遍历2个数据框以标识公用列

时间:2019-01-15 02:04:54

标签: r

我在这里有2个可重现的数据帧。我正在尝试确定哪一列包含与另一列相似的值。我希望我的代码能够进入每一行并遍历df2中的每一列。我的代码在下面工作,但是需要进行微调以允许同一列进行多次匹配。

https://graph.microsoft.com/beta/servicePrincipals?$filter=accountEnabled eq true and startswith(appId, '0')&$select=id,appId,displayName&$top=999

这是我希望获得的输出:

df1 <- data.frame(fruit=c("Apple", "Orange", "Pear"), location = c("Japan", "China", "Nigeria"), price = c(32,53,12))
df2 <- data.frame(grocery = c("Durian", "Apple", "Watermelon"), 
                  place=c("Korea", "Japan", "Malaysia"), 
                  name = c("Mark", "John", "Tammy"), 
                  favourite.food = c("Apple", "Wings", "Cakes"), 
                  invoice = c("XD1", "XD2", "XD3"))

df <- sapply(names(df1), function(x) {
  temp <- sapply(names(df2), function(y) 
    if(any(match(df1[[x]], df2[[y]], nomatch = FALSE))) y else NA)
  ifelse(all(is.na(temp)), NA, temp[which.max(!is.na(temp))])
}
)

t1 <- data.frame(lapply(df, type.convert), stringsAsFactors=FALSE)
t1 <- data.frame(t(t1))
t1 <- cbind(newColName = rownames(t1), t1)
rownames(t1) <- 1:nrow(t1)
colnames(t1) <- c("Columns from df1", "Columns from df2")

df1
   fruit location price
1  Apple    Japan    32
2 Orange    China    53
3   Pear  Nigeria    12

df2
     grocery    place  name favourite.food invoice
1     Durian    Korea  Mark          Apple     XD1
2      Apple    Japan  John          Wings     XD2
3 Watermelon Malaysia Tammy          Cakes     XD3

t1 #(OUTPUT FROM CODE ABOVE)

  Columns from df1    Columns from df2
1            fruit          grocery
2         location            place
3            price             <NA>
  

请注意,“ Grocery”和“ favourite.food”列均与“ fruit”列匹配,而我的代码仅返回一列。

1 个答案:

答案 0 :(得分:2)

我们可以更改代码以返回所有匹配项,然后使用toString

将它们包装在一个字符串中
vec <- sapply(names(df1), function(x) {
  temp <- sapply(names(df2), function(y) 
         if(any(match(df1[[x]], df2[[y]], nomatch = FALSE))) y else NA)
 ifelse(all(is.na(temp)), NA, toString(temp[!is.na(temp)]))
  }
)

vec

#         fruit                location      price 
#"grocery, favourite.food"      "place"        NA 

我们可以将其转换为数据框

data.frame(columns_from_df1 = names(vec), columns_from_df2 = vec, row.names = NULL)

#  columns_from_df1        columns_from_df2
#1            fruit grocery, favourite.food
#2         location                   place
#3            price                    <NA>