我是R的初学者。我有一个数据框,其中有两个因子列。一列是公司列,第二列是产品列。产品列中有几个缺失值,因此我想计算每个公司(或公司变量的每个级别)的产品列中的值数。我尝试了表,并在plyr包中计算函数,但它们似乎只能使用数值变量。请帮忙! 让我们说数据框看起来像这样:
df <- data.frame(company= c("A", "B", "C", "D", "A", "B", "C", "C", "D", "D"), product = c(1, 1, 2, 3, 4, 3, 3, NA, NA, NA))
所以我要找的输出是 -
A 2 B 2 C 3 D 2
提前致谢!!
答案 0 :(得分:1)
dplyr解决方案。
df %>%
filter(!is.na(product)) %>%
group_by(company) %>%
count()
# A tibble: 4 × 2
comp n
<fctr> <int>
1 A 2
2 B 2
3 C 3
4 D 1
答案 1 :(得分:1)
假设你的df是:
案例1)有问题
df的数据:
options(stringsAsFactors = F)
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c(1,1,2,3,4,3,3,1,NA,NA)
df <- data.frame(comp=comp,prod=prod)
<强>程序:强>
df$prodflag <- !is.na(df$prod)
tapply(df$prodflag , df$comp,sum)
<强>输出强>:
> tapply(df$prodflag , df$comp,sum)
A B C D
2 2 3 1
#########################################################################
案例2)如果启用了stringsAsFactors并且prod是字符,那么即使NAs被引用为字符并标记为因子,您也可以这样做:
数据:强>
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c("a","a","b","c","d","c","c","a","NA","NA")
df <- data.frame(comp=comp,prod=prod,stringsAsFactors = T)
<强>解决方案:强>
df$prodflag <- as.numeric(!as.character(df$prod)=="NA")
tapply(df$prodflag , df$comp,sum)
#########################################################################
案例3)如果prod是一个字符并且stringsAsFactors已启用但未引用NAs,则可以执行以下操作:
数据:强>
comp <- c("A", "B", "C", "D", "A", "B", "C", "C", "D","D" )
prod <- c("a","a","b","c","d","c","c","a",NA,NA)
df <- data.frame(comp=comp,prod=prod,stringsAsFactors = T)
<强>解决方案:强>
df$prodflag <- as.numeric(!is.na(df$prod))
tapply(df$prodflag , df$comp,sum)
故事的道德,我们应该了解我们的数据,然后我们可以找到最适合我们需要的逻辑。
答案 2 :(得分:1)
我们可以使用rowsum
base R
with(df, rowsum(+!is.na(prod), comp))