我有两个数据框,一个用于商店,另一个用于销售:
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"))
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store
#StoreID StoreName
# 1 McDonalds
# 2 A&W
# 3 Burger King
# 4 Wendy's
sales
#StoreID ItemID SalesQty
# 1 2 10
# 2 2 20
# 1 3 30
# 1 4 40
# 2 4 50
# 2 5 60
我想合并它们,以便我可以看到每个销售交易的StoreName:
merged <- merge(sales, store, by = "StoreID")
merged
#StoreID ItemID SalesQty StoreName
# 1 2 10 McDonalds
# 1 3 30 McDonalds
# 1 4 40 McDonalds
# 2 2 20 A&W
# 2 4 50 A&W
# 2 5 60 A&W
现在我想知道合并数据框中的每个StoreName,销售了多少个不同的商品:
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
#A&W Burger King McDonalds Wendy's
# 3 NA 3 NA
我的问题是,为什么tapply结果显示“Burger King”和“Wendy's”,即使它们不在合并的数据框中?
答案 0 :(得分:1)
这是因为store$StoreName
是factor
。创建商店数据框时,将参数stringsAsFactor
设置为FALSE
将确保在sales
期间删除merge
中没有匹配元素的商店名称。
sales <- data.frame(StoreID=c(1,2,1,1,2,2), ItemID=c(2,2,3,4,4,5), SalesQty=c(10,20,30,40,50,60))
store <- data.frame(StoreID=c(1,2,3,4), StoreName=c("McDonalds", "A&W", "Burger King", "Wendy's"), stringsAsFactors = FALSE)
merged <- merge(sales, store, by = "StoreID")
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
#A&W McDonalds
# 3 3
答案 1 :(得分:1)
你也可以试试这个:
merged$StoreName <- factor(merged$StoreName)
tapply(merged$ItemID, merged$StoreName, FUN = function(x) length(unique(x)))
# A&W McDonalds
# 3 3