计算数据帧列R中的出现次数

时间:2020-07-16 18:57:19

标签: r dataframe

我很难计算数据帧中的出现次数。我的数据如下:

          animal               food

1          horse               carrot
2          bird                seeds
3         monkey               banana 
4.         horse               hay
5          bird                berries
6.         horse               seeds

我正试图弄清每种食物的动物分类。例如,我想发现马吃了60%的干草,而另外40%被鸟和猴子吃了。我该如何找到这些并将它们放在单独的数据框中?

新数据框应如下所示:

          food                 horse      bird       monkey

1          carrot               60%        0%        40%
2          seeds                20%        60%       20%
3          banana               0%         0%        100% 
4.         berries              30%        50%       20%
5.         hay                  100%       0%        0%
             

百分比显然不正确,这只是一个例子。

3 个答案:

答案 0 :(得分:5)

喜欢吗?

df <- data.frame(
  stringsAsFactors = FALSE,
                  animal = c("horse","bird",
                             "monkey","horse","bird","horse"),
                    food = c("carrot","seeds",
                             "banana","hay","berries","seeds")
      )

with(df, prop.table(table(food, animal), margin = 1)) * 100

         animal
food      bird horse monkey
  banana     0     0    100
  berries  100     0      0
  carrot     0   100      0
  hay        0   100      0
  seeds     50    50      0

答案 1 :(得分:2)

您可以先计算总数:

xtabs(~ food + animal, data = dat)
#          animal
# food      bird horse monkey
#   banana     0     0      1
#   berries    1     0      0
#   carrot     0     1      0
#   hay        0     1      0
#   seeds      1     1      0

从这里开始,下一步将取决于您的需求。例如,如果您想要基于food的比例,那么

xt <- xtabs(~ food + animal, data = dat)
rowSums(xt)
#  banana berries  carrot     hay   seeds 
#       1       1       1       1       2 
xt / rowSums(xt)
#          animal
# food      bird horse monkey
#   banana   0.0   0.0    1.0
#   berries  1.0   0.0    0.0
#   carrot   0.0   1.0    0.0
#   hay      0.0   1.0    0.0
#   seeds    0.5   0.5    0.0

(如果需要,请乘以100)

(事后看来,我认为Dominik在这里使用prop.table更合适。)


数据:

dat <- structure(list(animal = c("horse", "bird", "monkey", "horse", 
"bird", "horse"), food = c("carrot", "seeds", "banana", "hay", 
"berries", "seeds")), class = "data.frame", row.names = c("1", 
"2", "3", "4.", "5", "6."))

答案 2 :(得分:0)

尝试此操作,您必须为计数创建一个num列:

library(tidyr)

df <- structure(list(animal = structure(c(2L, 1L, 3L, 2L, 1L, 2L), .Label = c("bird", 
"horse", "monkey"), class = "factor"), food = structure(c(3L, 
5L, 1L, 4L, 2L, 5L), .Label = c("banana", "berries", "carrot", 
"hay", "seeds"), class = "factor"), num = c(1, 1, 1, 1, 1, 1)), row.names = c("1", 
"2", "3", "4.", "5", "6."), class = "data.frame")

#Code
df$num <- 1
df2 <- pivot_wider(df,names_from = animal,values_from = num)
df2$Total <- rowSums(df2[,-1],na.rm=T)

df3 <- cbind(df2[,1,drop=F,],as.data.frame(lapply(df2[,-c(1,5)], function(x) x/df2$Total )))

     food horse bird monkey
1  carrot   1.0   NA     NA
2   seeds   0.5  0.5     NA
3  banana    NA   NA      1
4     hay   1.0   NA     NA
5 berries    NA  1.0     NA
相关问题