我的数据就像这样
df<- structure(list(data1 = c(20171205L, 20171205L, 20171205L, 20171205L,
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L,
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L,
20171205L, 20171205L, 20171205L, 20171205L), data2 = c(0.00546273,
0.00552377, 0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428,
0.00560006, 0.0055848, 0.00561532, 0.00555428, 0.0055848, 0.00552377,
0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428, 0.00560006,
0.0055848), data3 = c(0.00546273, 0.00552377, 0.00549325, 0.00550851,
0.00556954, 0.00560006, 0.00555428, 0.00560006, 0.0055848, 0.00561532,
0.00555428, 0.0055848, 0.00552377, 0.00549325, 0.00550851, 0.00556954,
0.00560006, 0.00555428, 0.00560006, 0.0055848), mydf = structure(1:20, .Label = c("B02",
"B03", "B04", "B05", "B06", "C02", "C03", "C04", "C05", "C06",
"D02", "D03", "D04", "D05", "D06", "E02", "E03", "E04", "E05",
"E06"), class = "factor")), .Names = c("data1", "data2", "data3",
"mydf"), class = "data.frame", row.names = c(NA, -20L))
我想取每列特定行的平均值(df $ mydf)
1-将所有行放在数据数据框中并取每列的平均值
B02
B03
C02
C03
D02
D03
E02
E03
2-将以下行也放在新数据框中并取每列的平均值
B04
B05
B06
C04
C05
C06
D04
D05
D06
E04
E05
E06
所以每列会有两个值(第一组和第二组的平均值)
data 1 data2 data2
我正在考虑将这些值从mydf列中取出然后以某种方式将其拆分但我无法弄明白
答案 0 :(得分:3)
在基础R中,您可以使用grepl
根据其后缀将行拆分为多个组。然后聚合每个组:
# Add group column (where mydf has 02, 03 suffix or 04, 05, 06)
df$group <- grepl('(4|5|6)$', df$mydf) + 1
df
#> data1 data2 data3 mydf group
#> 1 20171205 0.00546273 0.00546273 B02 1
#> 2 20171205 0.00552377 0.00552377 B03 1
#> 3 20171205 0.00549325 0.00549325 B04 2
#> 4 20171205 0.00550851 0.00550851 B05 2
#> 5 20171205 0.00556954 0.00556954 B06 2
#> 6 20171205 0.00560006 0.00560006 C02 1
#> 7 20171205 0.00555428 0.00555428 C03 1
#> 8 20171205 0.00560006 0.00560006 C04 2
#> 9 20171205 0.00558480 0.00558480 C05 2
#> 10 20171205 0.00561532 0.00561532 C06 2
#> 11 20171205 0.00555428 0.00555428 D02 1
#> 12 20171205 0.00558480 0.00558480 D03 1
#> 13 20171205 0.00552377 0.00552377 D04 2
#> 14 20171205 0.00549325 0.00549325 D05 2
#> 15 20171205 0.00550851 0.00550851 D06 2
#> 16 20171205 0.00556954 0.00556954 E02 1
#> 17 20171205 0.00560006 0.00560006 E03 1
#> 18 20171205 0.00555428 0.00555428 E04 2
#> 19 20171205 0.00560006 0.00560006 E05 2
#> 20 20171205 0.00558480 0.00558480 E06 2
# Take column averages by group
aggregate(x = df[, 1:3], by = list(group = df$group), FUN = mean)
#> group data1 data2 data3
#> 1 1 20171205 0.005556190 0.005556190
#> 2 2 20171205 0.005553013 0.005553013
答案 1 :(得分:1)
使用dplyr
的解决方案。 group_by
用于定义分组变量,summarise_at
用于计算mydf
以外的所有列的平均值,vars(-mydf)
排除该列。
library(dplyr)
df2 <- df %>%
group_by(Group = case_when(
grepl("02$|03$", mydf) ~ 1L,
grepl("04$|05$|06$", mydf) ~ 2L,
TRUE ~ NA_integer_
)) %>%
summarise_at(vars(-mydf), funs(mean(.)))
df2
# # A tibble: 2 x 4
# Group data1 data2 data3
# <int> <dbl> <dbl> <dbl>
# 1 1 20171205 0.005556190 0.005556190
# 2 2 20171205 0.005553013 0.005553013