简单改造,我有以下数据:
df<-data.frame(Product=c("A","A","A","B","B","C"), Ingredients=c("Chocolate","Vanilla","Berry","Chocolate","Berry2","Vanilla"))
df
Product Ingredients
1 A Chocolate
2 A Vanilla
3 A Berry
4 B Chocolate
5 B Berry2
6 C Vanilla
我想为“成分”的每个唯一值添加一列,例如:
df2
Product Ingredient_1 Ingredient_2 Ingredient_3
A Chocolate Vanilla Berry
B Chocolate Berry2 NULL
C Vanilla NULL NULL
似乎微不足道,我尝试重塑,但我不断得到计数(不是“成分”的实际值)。想法?
答案 0 :(得分:2)
这是使用data.table
包
library(data.table)
setDT(df)[, Ingredient := paste0("Ingredient_", seq_len(.N)), Product]
dcast(df, Product ~ Ingredient, value.var = "Ingredients")
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
Alternavely,我们可以使用性感dplyr/tidyr
组合
library(dplyr)
library(tidyr)
df %>%
group_by(Product) %>%
mutate(Ingredient = paste0("Ingredient_", row_number())) %>%
spread(Ingredient, Ingredients)
# Source: local data frame [3 x 4]
#
# Product Ingredient_1 Ingredient_2 Ingredient_3
# 1 A Chocolate Vanilla Berry
# 2 B Chocolate Berry2 NA
# 3 C Vanilla NA NA
答案 1 :(得分:2)
本着分享替代方案的精神,还有两个:
选项1 :split
列并使用stri_list2matrix
创建广泛的表单。
library(stringi)
x <- with(df, split(Ingredients, Product))
data.frame(Product = names(x), stri_list2matrix(x))
# Product X1 X2 X3
# 1 A Chocolate Chocolate Vanilla
# 2 B Vanilla Berry2 <NA>
# 3 C Berry <NA> <NA>
选项2 :使用我的&#34; splitstackshape&#34;中的getanID
包以生成&#34; .id&#34;列,然后dcast
它。 &#34; data.table&#34;包装有&#34; splitstackshape&#34;,因此您可以直接致电dcast.data.table
进行重塑。
library(splitstackshape)
dcast.data.table(getanID(df, "Product"),
Product ~ .id, value.var = "Ingredients")
# Product 1 2 3
# 1: A Chocolate Vanilla Berry
# 2: B Chocolate Berry2 NA
# 3: C Vanilla NA NA
答案 2 :(得分:1)
使用基数R reshape
df$Count<-ave(rep(1,nrow(df)),df$Product,FUN=cumsum)
reshape(df,idvar="Product",timevar="Count",direction="wide",sep="_")
# Product Ingredients_1 Ingredients_2 Ingredients_3
#1 A Chocolate Vanilla Berry
#4 B Chocolate Berry2 <NA>
#6 C Vanilla <NA> <NA>