我有excel数据集如下:
AllJobs
JobID
Name
Description
我想获得一个权重与数量数据透视表,每个类别的价格总和如下:
Weight Quantity Price
72 5 460
73 8 720
75 20 830
95 2 490
91 15 680
82 14 340
88 30 250
89 6 770
78 27 820
98 24 940
99 29 825
我为各个类别创建了两个表,以便使用 0-10 10-20 20-30
70-80 1180 830 820
80-90 770 340 250
90-100 490 680 1765
包获得平均值和计数,如下所示:
dplyr
现在,如何使用table1和table2创建如上所示的交叉表?
答案 0 :(得分:5)
也许以下是你想要的。它会像您一样使用cut
,然后使用xtabs
。
Weight = cut(dataset$Weight, breaks = c(70,80,90,100))
Quantity = cut(dataset$Quantity, breaks = c(0,10,20,30))
dt2 <- data.frame(Weight, Quantity, Price = dataset$Price)
xtabs(Price ~ Weight + Quantity, dt2)
# Quantity
#Weight (0,10] (10,20] (20,30]
# (70,80] 1180 830 820
# (80,90] 770 340 250
# (90,100] 490 680 1765
答案 1 :(得分:2)
dplyr
和tidyr
解决方案:
library(dplyr)
library(tidyr)
df %>%
mutate(Weight = cut(Weight, breaks = c(70,80,90,100)),
Quantity = cut(Quantity, breaks = c(0,10,20,30))) %>%
group_by(Weight, Quantity) %>%
summarise(Price = sum(Price)) %>%
spread(Quantity, Price)
# A tibble: 3 x 4
# Groups: Weight [3]
Weight `(0,10]` `(10,20]` `(20,30]`
* <fct> <int> <int> <int>
1 (70,80] 1180 830 820
2 (80,90] 770 340 250
3 (90,100] 490 680 1765
数据:
df <- structure(list(Weight = c(72L, 73L, 75L, 95L, 91L, 82L, 88L,
89L, 78L, 98L, 99L), Quantity = c(5L, 8L, 20L, 2L, 15L, 14L,
30L, 6L, 27L, 24L, 29L), Price = c(460L, 720L, 830L, 490L, 680L,
340L, 250L, 770L, 820L, 940L, 825L)), .Names = c("Weight", "Quantity",
"Price"), class = "data.frame", row.names = c(NA, -11L))