如何基于另一列的值聚合一列的R数据帧

时间:2019-09-08 06:30:51

标签: r dataframe

我的数据框如下。 (类似,实际上还有更多的行和列)

      Gender Energetic   Weekly_Apple   Weekly_Banana
1   Female        3           No           Yes
2   Female        3           No           Yes
3   Male          5           No           Yes
4   Male          2           No            No
5   Female        1           No            No

我想要基于汇总“是”响应的简短代码,以输出以下内容:

        Male        Female
Apples    0           0                
Bananas   1           2

每个性别吃的苹果数量= 0。 1个男性和2个女性吃苹果。

我尝试了以下方法:

count(original_data, c("Gender","Weekly_Apple"))
count(original_data, c("Gender","Weekly_Banana"))
count(original_data, c("Gender","Weekly_Grape"))
count(original_data, c("Gender","Weekly_PineApple"))

aggregate(x = original_data[c("Weekly_Apple", 
                          "Weekly_Banana",
                          "Weekly_Grape")],
                   by = original_data[c("Gender")],
                   FUN = n())

5 个答案:

答案 0 :(得分:2)

如NelsonGon所建议,我已将df1 <- t(df1)替换为tidyr::crossing(df1)

library(dplyr)    
df<-data.frame(
  Gender=c("Female", "Female", "Male", "Male", "Female"), 
  Energetic =c(3,3,5,2,1), 
  Weekly_Apple = c("No", "No", "No", "No", "No"), 
  Weekly_Banana = c("Yes", "Yes", "Yes", "No", "No"))

df1 <- df %>% 
  group_by(Gender) %>% 
  summarise(
    Apples = sum(Weekly_Apple=="Yes"), 
    Bananas = sum(Weekly_Banana =="Yes")
  )

df1 <- tidyr::crossing(df1)

答案 1 :(得分:1)

一种data.table可能是:

dcast(variable ~ Gender, 
      value.var = "value", 
      fun = function(x) sum(x == "Yes"), 
      data = melt(df[-2], id.vars = "Gender"))

       variable Female Male
1  Weekly_Apple      0    0
2 Weekly_Banana      2    1

答案 2 :(得分:1)

您可以使用基数R:

table(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_")[-(2:3)])[,,'Yes']
        time
Gender   Apple Banana
  Female     0      2
  Male       0      1

甚至

xtabs(Weekly~time+Gender,transform(reshape(cbind(df,id=1:nrow(df)),3:4,idvar = "id",dir="long",sep="_"),Weekly=Weekly=="Yes"))

        Gender
time     Female Male
  Apple       0    0
  Banana      2    1

答案 3 :(得分:1)

一个if ( v.size >= 2 ) { mergeSort(v, 0, v.size() - 1); } 替代项:

dplyr-tidyr

数据:

    df %>% 
  group_by(Gender) %>% 
   summarise_at(vars(contains("Weekly")), function(x) sum(x=="Yes")) %>% 
   tidyr::gather(key, val , -Gender) %>% 
   tidyr::spread(Gender, val)
# A tibble: 2 x 3
  key           Female  Male
  <chr>          <int> <int>
1 Weekly_Apple       0     0
2 Weekly_Banana      2     1

答案 4 :(得分:0)

具有base R的另一个tapply版本

t(sapply(names(df)[3:4], function(nm) with(df, tapply(df[[nm]]=="Yes", Gender,sum))))
#               Female Male
#Weekly_Apple       0    0
#Weekly_Banana      2    1

或与split

sapply(split(df[3:4], df$Gender), function(x) colSums(x == "Yes"))

或其变体

sapply(split(as.data.frame(df[3:4] == "Yes"), df$Gender), colSums)