我有一个大型数据框,其中包含啤酒评论,其中包含重复评论和重复内部的一些差异。
> head( beer_data )
brewery_id brewery_name review_time review_overall
1 10325 Vecchio Birraio 1234817823 1.5
2 10325 Vecchio Birraio 1235915097 3.0
3 10325 Vecchio Birraio 1235916604 3.0
4 10325 Vecchio Birraio 1234725145 3.0
5 1075 Caldera Brewing Company 1293735206 4.0
6 1075 Caldera Brewing Company 1325524659 3.0
review_aroma review_appearance review_profilename
1 2.0 2.5 stcules
2 2.5 3.0 stcules
3 2.5 3.0 stcules
4 3.0 3.5 stcules
5 4.5 4.0 johnmichaelsen
6 3.5 3.5 oline73
beer_style review_palate review_taste
1 Hefeweizen 1.5 1.5
2 English Strong Ale 3.0 3.0
3 Foreign / Export Stout 3.0 3.0
4 German Pilsener 2.5 3.0
5 American Double / Imperial IPA 4.0 4.5
6 Herbed / Spiced Beer 3.0 3.5
beer_name beer_abv beer_beerid
1 Sausa Weizen 5.0 47986
2 Red Moon 6.2 48213
3 Black Horse Black Beer 6.5 48215
4 Sausa Pils 5.0 47969
5 Cauldron DIPA 7.7 64883
6 Caldera Ginger Beer 4.7 52159
>
我想使用ddply将重复的啤酒审核列汇总到一个新的较小的数据框中进行分析,这可以使用ddply吗?
答案 0 :(得分:0)
这样的事情怎么样?
duplicate_data <- ddply(beer_data, .(brewery_id), function(x) {
if(nrow(x) > 1)
return(data.frame("brewery_id" = unique(x$brewery_id), "mean_ratings" = mean(x$review_overall)))
# You can fill in the rest
})