根据另一列获取数据帧列中连续出现的字符串计数

时间:2015-03-27 09:35:30

标签: r aggregate

我需要找出一个值在某个数据帧的列中出现多少次。

主要逻辑是根据另一列获取特定字符串的出现次数。

例如:

df<- data.frame(fruits = c("apples", "apples", "orange", "pears", "apples", "pears", "pears", "papaya", "papaya"), 
                veggies = c("beans", "carrots", "carrots", "carrots", "brinjal","carrots", "brinjal", "brinjal", "beans"),
                branches=c( "Area1", "Area1", "Area1", "Area2","Area2","Area2", "Area2", "Area3", "Area3" ))

这是我的数据框架。我需要根据分支栏

知道水果或蔬菜的数量

当我使用table(df$fruits)

输出是:

apples-3 orange-1 papaya-2  pears-3

输出通常显示所有分支的苹果和其余水果的总数。我需要为每个分支准确计算。

我的所需输出应基于列df$Branches

for Area1
   apples-2 orange-1,
for Area2 
   pears-3 apples-1
for Area3 
   papaya-3

3 个答案:

答案 0 :(得分:1)

试试这个:

library(data.table)
setDT(df)[,list(count=.N),list(branches, fruits)]

#   branches fruits count
#1:    Area1 apples     2
#2:    Area1 orange     1
#3:    Area2  pears     3
#4:    Area2 apples     1
#5:    Area3 papaya     2

答案 1 :(得分:1)

也许只使用ftable

> ftable(fruits ~ branches, data = df)
         fruits apples orange papaya pears
branches                                  
Area1                2      1      0     0
Area2                1      0      0     3
Area3                0      0      2     0
> ftable(veggies ~ branches, data = df)
         veggies beans brinjal carrots
branches                              
Area1                1       0       2
Area2                0       2       2
Area3                1       1       0

答案 2 :(得分:0)

我不知道您期望的输出,但您可以使用dplyr包获取计数:

例如:

library(dplyr)
df %>% count(fruits, branches)
# OR
count(df, fruits, branches)

输出:

Source: local data frame [5 x 3]
Groups: fruits

  fruits branches n
1 apples    Area1 2
2 apples    Area2 1
3 orange    Area1 1
4 papaya    Area3 2
5  pears    Area2 3