我有以下data.frame:
Engine | MPG | Test_Distance
1. V6 | 17 | 751
2. V4 | 22 | 1850
3. V4-Hybrid| 26 | 210
4. V6-Hybrid| 24 | 85
5. Flat4 | 26 | 4560
6. V6-Hybrid| 28 | 124
7. Flat4 | 17 | 3455
8. V4 | 17 | 1642
其中Engine是因子向量,而MPG和Test_Distance都是数值向量。
在进行更复杂的统计计算和绘图之前,我想通过排序来简化data.frame:
注意:此data.frame中还有许多其他列,但我只放了三个来简化方法。
这是我想要的结果data.frame:
Engine_Type | MPG_avg | Test_Distance_total
1. Vx | 18.7 | 4243
2. Vx_Hybrid| 26 | 419
3. Flatx | 14.4 | 8015
4. TOTALS | 19.7 | 12677
我尝试使用dplyr
和plyr
个软件包以及以下函数:aggregate
,rowSums
,colSums
,data.table
。但无济于事。我想创建一个临时data.frame,然后重新整合原始data.frame中的新值,但我希望有更快的方法来实现它。
有什么建议吗?
答案 0 :(得分:2)
我们将{Engine'中的数字替换为group_by
summarise
中的'x',以获得'MPG'和'Test_Distance'的mean
和sum
'分别用行汇总输出的mean
和sum
绑定行
library(dplyr)
df1 %>%
group_by(Engine = sub("\\d+", "x", Engine)) %>%
summarise(MPG = mean(MPG), Test_Distance_total = sum(Test_Distance))%>%
bind_rows(tibble(Engine = 'TOTALS',
MPG = mean(.$MPG),
Test_Distance_total = sum(.$Test_Distance_total)))
# A tibble: 4 x 3
# Engine MPG Test_Distance_total
# <chr> <dbl> <int>
#1 Flatx 21.5 8015
#2 Vx 18.7 4243
#3 Vx-Hybrid 26.0 419
#4 TOTALS 22.1 12677
df1 <- structure(list(Engine = c("V6", "V4", "V4-Hybrid", "V6-Hybrid",
"Flat4", "V6-Hybrid", "Flat4", "V4"), MPG = c(17L, 22L, 26L,
24L, 26L, 28L, 17L, 17L), Test_Distance = c(751L, 1850L, 210L,
85L, 4560L, 124L, 3455L, 1642L)), .Names = c("Engine", "MPG",
"Test_Distance"), class = "data.frame", row.names = c("1.", "2.",
"3.", "4.", "5.", "6.", "7.", "8."))