我目前正在学习使用data.frame,并对如何重新排序它们感到困惑。
目前,我有一个data.frame显示:
或在视觉上像这样:
+---+-----------+-------+----------+--+
| | Shop.Name | Items | Product | |
+---+-----------+-------+----------+--+
| 1 | Shop1 | 2 | Product1 | |
| 2 | Shop1 | 4 | Product2 | |
| 3 | Shop2 | 3 | Product1 | |
| 4 | Shop3 | 2 | Product1 | |
| 5 | Shop3 | 1 | Product4 | |
+---+-----------+-------+----------+--+
我想要实现的是以下“以商店为中心”的结构:
如果某个商店/产品没有专线(因为没有销售),我想创建一个0。
或
+---+-------+-------+-------+-------+-------+-----+--+--+
| | Shop | Prod1 | Prod2 | Prod3 | Prod4 | ... | | |
+---+-------+-------+-------+-------+-------+-----+--+--+
| 1 | Shop1 | 2 | 4 | 0 | 0 | ... | | |
| 2 | Shop2 | 3 | 0 | 0 | 0 | ... | | |
| 3 | Shop3 | 2 | 0 | 0 | 1 | ... | | |
+---+-------+-------+-------+-------+-------+-----+--+--+
答案 0 :(得分:12)
到目前为止,答案在某种程度上起作用,但没有完全回答你的问题。特别是,它们没有解决没有商店销售特定产品的情况的问题。根据您的示例输入和所需的输出,没有商店出售“Product3”。实际上,“Product3”甚至没有出现在您的来源data.frame
中。此外,它们没有解决每个Shop + Product组合具有多行的可能情况。
到目前为止,这是您的数据的修改版本和两个解决方案。我为“Shop1”和“Product1”的组合添加了另一行。请注意,我已将您的产品转换为factor
变量,其中包含变量可以采用的级别,即使这些级别实际上都没有该级别。
mydf <- data.frame(
Shop.Name = c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3", "Shop1"),
Items = c(2, 4, 3, 2, 1, 2),
Product = factor(
c("Product1", "Product2", "Product1", "Product1", "Product4", "Product1"),
levels = c("Product1", "Product2", "Product3", "Product4")))
dcast
library(reshape2)
dcast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Using Product as value column: use value.var to override.
# Aggregation function missing: defaulting to length
# Error in .fun(.value[i], ...) :
# 2 arguments passed to 'length' which requires 1
什?突然不起作用。这样做:
dcast(mydf, formula = Shop.Name ~ Product,
fill = 0, value.var = "Items",
fun.aggregate = sum, drop = FALSE)
# Shop.Name Product1 Product2 Product3 Product4
# 1 Shop1 4 4 0 0
# 2 Shop2 3 0 0 0
# 3 Shop3 2 0 0 1
让我们去上学。来自“重塑”的cast
library(reshape)
cast(mydf, formula = Shop.Name ~ Product, value="Items", fill=0)
# Aggregation requires fun.aggregate: length used as default
# Shop.Name Product1 Product2 Product4
# 1 Shop1 2 1 0
# 2 Shop2 1 0 0
# 3 Shop3 1 0 1
的Eh。不是你想要的......试试这个:
cast(mydf, formula = Shop.Name ~ Product,
value = "Items", fill = 0,
add.missing = TRUE, fun.aggregate = sum)
# Shop.Name Product1 Product2 Product3 Product4
# 1 Shop1 4 4 0 0
# 2 Shop2 3 0 0 0
# 3 Shop3 2 0 0 1
让我们回到基础。来自基地R的xtabs
xtabs(Items ~ Shop.Name + Product, mydf)
# Product
# Shop.Name Product1 Product2 Product3 Product4
# Shop1 4 4 0 0
# Shop2 3 0 0 0
# Shop3 2 0 0 1
或者,如果您更喜欢data.frame
(请注意您的“Shop.Name”变量已转换为row.names
的{{1}}):
data.frame
答案 1 :(得分:1)
使用dcast
库中的reshape2
:
library(reshape2)
> df <- data.frame(Shop.Name=rep(c("Shop1","Shop2","Shop3"),each=3),
+ Items=rpois(9,5),
+ Product=c(rep(c("Prod1","Prod2","Prod3","Prod4"),2),"Prod5")
+ )
> df
Shop.Name Items Product
1 Shop1 6 Prod1
2 Shop1 5 Prod2
3 Shop1 6 Prod3
4 Shop2 5 Prod4
5 Shop2 6 Prod1
6 Shop2 6 Prod2
7 Shop3 4 Prod3
8 Shop3 7 Prod4
9 Shop3 5 Prod5
> dcast(df,Shop.Name ~ Product,value.var="Items",fill=0)
Shop.Name Prod1 Prod2 Prod3 Prod4 Prod5
1 Shop1 6 5 6 0 0
2 Shop2 6 6 0 5 0
3 Shop3 0 0 4 7 5
答案 2 :(得分:0)
如果您出于任何原因想要使用原始的重塑包:
Shop.Name <- c("Shop1", "Shop1", "Shop2", "Shop3", "Shop3")
Items <- c(2,4,3,2,1)
Product <- c("Product1", "Product2", "Product1", "Product1", "Product4")
(df <- data.frame(Shop.Name, Items, Product))
cast(df, formula = Shop.Name ~ Product, value="Items", fill=0)