全部
我的dataset
如下所示。我正在尝试回答以下问题。
问题:
仅基于图纸数据,商店是否会销售一种纸类型(paper.type)比其他纸类型更多的单元(units.sold列)?
为回答上述问题,我使用了tapply
函数,可以过滤两篇论文的数据。现在,我不确定如何继续进行操作以仅获取工程图数据。任何帮助表示赞赏!
我的代码
tapply(df$units.sold,list(df$paper,df$paper.type,df$store),sum)
数据集
date year rep store paper paper.type unit.price units.sold total.sale
9991 12/30/2015 2015 Ran Dublin watercolor sheet 0.77 5 3.85
9992 12/30/2015 2015 Ran Dublin drawing pads 10.26 1 10.26
9993 12/30/2015 2015 Arijit Syracuse watercolor pad 12.15 2 24.30
9994 12/30/2015 2015 Thomas Davenport drawing roll 20.99 1 20.99
9995 12/31/2015 2015 Ruisi Dublin watercolor sheet 0.77 7 5.39
9996 12/31/2015 2015 Mohit Davenport drawing roll 20.99 1 20.99
9997 12/31/2015 2015 Aman Portland drawing pads 10.26 1 10.26
9998 12/31/2015 2015 Barakat Portland watercolor block 19.34 1 19.34
9999 12/31/2015 2015 Yunzhu Syracuse drawing journal 24.94 1 24.94
10000 12/31/2015 2015 Aman Portland watercolor block 19.34 1 19.34
注意:我是R的新手。请提供解释以及您的代码。
答案 0 :(得分:3)
使用dplyr
中的tidyverse
,然后启动其filter
函数。您可以使用%>%
管道运算符将函数链接在一起。
df2 <- df %>%
filter(paper == "drawing") %>%
group_by(store, paper.type) %>%
summarise(units.sold = sum(units.sold))
store paper.type units.sold
<chr> <chr> <dbl>
1 Davenport roll 2
2 Dublin pads 1
3 Portland pads 1
4 Syracuse journal 1
答案 1 :(得分:1)
您可以基于aggregate
和unit.sold
来提取store
列中的paper.type
aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)
# store paper.type units.sold
#1 Syracuse journal 1
#2 Dublin pads 1
#3 Portland pads 1
#4 Davenport roll 2
这里,我们仅过滤paper
类型的数据。根据此输出,我们可以比较每个units.sold
和store
的{{1}}的数量。
答案 2 :(得分:1)
我们可以使用data.table
。使用setDT
将'data.frame'转换为'data.table',并按'store''paper.type'分组,指定i
表达式(paper == 'drawing'
)来对行进行子集化并通过获取sum
来总结“ units.sold”
library(data.table)
setDT(df)[paper == "drawing", .(units.sold = sum(units.sold)), .(store, paper.type)]
# store paper.type units.sold
#1: Dublin pads 1
#2: Davenport roll 2
#3: Portland pads 1
#4: Syracuse journal 1
df <- structure(list(date = c("12/30/2015", "12/30/2015", "12/30/2015",
"12/30/2015", "12/31/2015", "12/31/2015", "12/31/2015", "12/31/2015",
"12/31/2015", "12/31/2015"), year = c(2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L), rep = c("Ran", "Ran",
"Arijit", "Thomas", "Ruisi", "Mohit", "Aman", "Barakat", "Yunzhu",
"Aman"), store = c("Dublin", "Dublin", "Syracuse", "Davenport",
"Dublin", "Davenport", "Portland", "Portland", "Syracuse", "Portland"
), paper = c("watercolor", "drawing", "watercolor", "drawing",
"watercolor", "drawing", "drawing", "watercolor", "drawing",
"watercolor"), paper.type = c("sheet", "pads", "pad", "roll",
"sheet", "roll", "pads", "block", "journal", "block"), unit.price = c(0.77,
10.26, 12.15, 20.99, 0.77, 20.99, 10.26, 19.34, 24.94, 19.34),
units.sold = c(5L, 1L, 2L, 1L, 7L, 1L, 1L, 1L, 1L, 1L), total.sale = c(3.85,
10.26, 24.3, 20.99, 5.39, 20.99, 10.26, 19.34, 24.94, 19.34
)), class = "data.frame", row.names = c("9991", "9992", "9993",
"9994", "9995", "9996", "9997", "9998", "9999", "10000"))