如何根据具体数字平均取几行

时间:2017-12-12 00:25:39

标签: r dataframe

我的数据就像这样

df<- structure(list(data1 = c(20171205L, 20171205L, 20171205L, 20171205L, 
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 
20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 20171205L, 
20171205L, 20171205L, 20171205L, 20171205L), data2 = c(0.00546273, 
0.00552377, 0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428, 
0.00560006, 0.0055848, 0.00561532, 0.00555428, 0.0055848, 0.00552377, 
0.00549325, 0.00550851, 0.00556954, 0.00560006, 0.00555428, 0.00560006, 
0.0055848), data3 = c(0.00546273, 0.00552377, 0.00549325, 0.00550851, 
0.00556954, 0.00560006, 0.00555428, 0.00560006, 0.0055848, 0.00561532, 
0.00555428, 0.0055848, 0.00552377, 0.00549325, 0.00550851, 0.00556954, 
0.00560006, 0.00555428, 0.00560006, 0.0055848), mydf = structure(1:20, .Label = c("B02", 
"B03", "B04", "B05", "B06", "C02", "C03", "C04", "C05", "C06", 
"D02", "D03", "D04", "D05", "D06", "E02", "E03", "E04", "E05", 
"E06"), class = "factor")), .Names = c("data1", "data2", "data3", 
"mydf"), class = "data.frame", row.names = c(NA, -20L))

我想取每列特定行的平均值(df $ mydf)

1-将所有行放在数据数据框中并取每列的平均值

B02
B03
C02
C03
D02
D03
E02
E03

2-将以下行也放在新数据框中并取每列的平均值

B04
B05
B06
C04
C05
C06
D04
D05
D06
E04
E05
E06

所以每列会有两个值(第一组和第二组的平均值)

data 1    data2    data2

我正在考虑将这些值从mydf列中取出然后以某种方式将其拆分但我无法弄明白

2 个答案:

答案 0 :(得分:3)

在基础R中,您可以使用grepl根据其后缀将行拆分为多个组。然后聚合每个组:

# Add group column (where mydf has 02, 03 suffix or 04, 05, 06)
df$group <- grepl('(4|5|6)$', df$mydf) + 1
df
#>       data1      data2      data3 mydf group
#> 1  20171205 0.00546273 0.00546273  B02     1
#> 2  20171205 0.00552377 0.00552377  B03     1
#> 3  20171205 0.00549325 0.00549325  B04     2
#> 4  20171205 0.00550851 0.00550851  B05     2
#> 5  20171205 0.00556954 0.00556954  B06     2
#> 6  20171205 0.00560006 0.00560006  C02     1
#> 7  20171205 0.00555428 0.00555428  C03     1
#> 8  20171205 0.00560006 0.00560006  C04     2
#> 9  20171205 0.00558480 0.00558480  C05     2
#> 10 20171205 0.00561532 0.00561532  C06     2
#> 11 20171205 0.00555428 0.00555428  D02     1
#> 12 20171205 0.00558480 0.00558480  D03     1
#> 13 20171205 0.00552377 0.00552377  D04     2
#> 14 20171205 0.00549325 0.00549325  D05     2
#> 15 20171205 0.00550851 0.00550851  D06     2
#> 16 20171205 0.00556954 0.00556954  E02     1
#> 17 20171205 0.00560006 0.00560006  E03     1
#> 18 20171205 0.00555428 0.00555428  E04     2
#> 19 20171205 0.00560006 0.00560006  E05     2
#> 20 20171205 0.00558480 0.00558480  E06     2

# Take column averages by group
aggregate(x = df[, 1:3], by = list(group = df$group), FUN = mean)
#>   group    data1       data2       data3
#> 1     1 20171205 0.005556190 0.005556190
#> 2     2 20171205 0.005553013 0.005553013

答案 1 :(得分:1)

使用dplyr的解决方案。 group_by用于定义分组变量,summarise_at用于计算mydf以外的所有列的平均值,vars(-mydf)排除该列。

library(dplyr)

df2 <- df %>%
  group_by(Group = case_when(
    grepl("02$|03$", mydf)       ~ 1L,
    grepl("04$|05$|06$", mydf)   ~ 2L,
    TRUE                       ~ NA_integer_
  )) %>%
  summarise_at(vars(-mydf), funs(mean(.)))
df2
# # A tibble: 2 x 4
#   Group    data1       data2       data3
#   <int>    <dbl>       <dbl>       <dbl>
# 1     1 20171205 0.005556190 0.005556190
# 2     2 20171205 0.005553013 0.005553013