按数据框中因子列的级别计算字符列中的元素数

时间:2017-05-19 16:24:11

标签: r

我是R的初学者。我有一个数据框,其中有两个因子列。一列是公司列,第二列是产品列。产品列中有几个缺失值,因此我想计算每个公司(或公司变量的每个级别)的产品列中的值数。我尝试了表,并在plyr包中计算函数,但它们似乎只能使用数值变量。请帮忙! 让我们说数据框看起来像这样:

df <- data.frame(company= c("A", "B", "C", "D", "A", "B", "C", "C", "D", "D"), product = c(1, 1, 2, 3, 4, 3, 3, NA, NA, NA))

所以我要找的输出是 -

A 2 B 2 C 3 D 2

提前致谢!!

3 个答案:

答案 0 :(得分:1)

dplyr解决方案。

df %>% 
    filter(!is.na(product)) %>% 
    group_by(company) %>% 
    count()

# A tibble: 4 × 2
    comp     n
  <fctr> <int>
1      A     2
2      B     2
3      C     3
4      D     1

答案 1 :(得分:1)

假设你的df是:

案例1)有问题

df的数据:

options(stringsAsFactors = F)
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c(1,1,2,3,4,3,3,1,NA,NA)
df <- data.frame(comp=comp,prod=prod)

<强>程序:

df$prodflag <- !is.na(df$prod)
tapply(df$prodflag , df$comp,sum)

<强>输出

> tapply(df$prodflag , df$comp,sum)
A B C D 
2 2 3 1 

#########################################################################

案例2)如果启用了stringsAsFactors并且prod是字符,那么即使NAs被引用为字符并标记为因子,您也可以这样做:

数据:

comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c("a","a","b","c","d","c","c","a","NA","NA")
df <- data.frame(comp=comp,prod=prod,stringsAsFactors = T)

<强>解决方案:

df$prodflag <- as.numeric(!as.character(df$prod)=="NA")
tapply(df$prodflag , df$comp,sum)

#########################################################################

案例3)如果prod是一个字符并且stringsAsFactors已启用但未引用NAs,则可以执行以下操作:

数据:

comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c("a","a","b","c","d","c","c","a",NA,NA)
df <- data.frame(comp=comp,prod=prod,stringsAsFactors = T)

<强>解决方案:

df$prodflag <- as.numeric(!is.na(df$prod))
tapply(df$prodflag , df$comp,sum)

故事的道德,我们应该了解我们的数据,然后我们可以找到最适合我们需要的逻辑。

答案 2 :(得分:1)

我们可以使用rowsum

中的base R
with(df, rowsum(+!is.na(prod), comp))