两个数据帧的内部联接仍显示所有值

时间:2017-02-09 04:51:57

标签: r join dataframe

我有两个数据框,一个用于商店,另一个用于销售:

store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"))
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))  

store    
#StoreID   StoreName
#      1   McDonalds
#      2         A&W
#      3 Burger King
#      4     Wendy's 

sales  
#StoreID ItemID SalesQty  
#      1      2       10  
#      2      2       20  
#      1      3       30  
#      1      4       40  
#      2      4       50  
#      2      5       60  

我想合并它们,以便我可以看到每个销售交易的StoreName:

merged <- merge(sales, store, by = "StoreID")

merged
#StoreID ItemID SalesQty StoreName  
#      1      2       10 McDonalds  
#      1      3       30 McDonalds
#      1      4       40 McDonalds
#      2      2       20       A&W
#      2      4       50       A&W
#      2      5       60       A&W

现在我想知道合并数据框中的每个StoreName,销售了多少个不同的商品:

tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

#A&W Burger King   McDonalds     Wendy's 
#  3          NA           3          NA 

我的问题是,为什么tapply结果显示“Burger King”和“Wendy's”,即使它们不在合并的数据框中?

2 个答案:

答案 0 :(得分:1)

这是因为store$StoreNamefactor。创建商店数据框时,将参数stringsAsFactor设置为FALSE将确保在sales期间删除merge中没有匹配元素的商店名称。

sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))  
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"), stringsAsFactors = FALSE)
merged <- merge(sales, store, by = "StoreID")
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

  #A&W McDonalds 
  #  3         3 

答案 1 :(得分:1)

你也可以试试这个:

merged$StoreName <- factor(merged$StoreName)
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))

#  A&W McDonalds 
#    3         3