通过总结国家/地区的性别统计数据,不确定如何在此处驯化ddply
。我有这个数据框
df <- data.frame(country = c("Italy", "Germany", "Italy", "USA","Poland"),
gender = c("male", "female", "male", "female", "female"))
我想要一个数据框,每行详细说明每个国家/地区有多少男性和女性。然而
ddply(df,~country,table)
country female male
1 Germany 1 0
2 Germany 0 0
3 Germany 0 0
4 Germany 0 0
5 Italy 0 0
6 Italy 0 2
7 Italy 0 0
8 Italy 0 0
9 Poland 0 0
10 Poland 0 0
11 Poland 1 0
12 Poland 0 0
13 USA 0 0
14 USA 0 0
15 USA 0 0
16 USA 1 0
虽然它产生了预期的结果,但它也为每组增加了三行。为什么呢?
答案 0 :(得分:0)
我找到了这个解决方案。不确定是最优雅的。
df <- data.frame(country = c("Italy", "Germany", "Italy", "USA","Poland"),
gender = c("male", "female", "male", "female", NA))
ddply(df, .(country), summarise,
female=sum(gender=="female",na.rm = TRUE),
male=sum(gender=="male", na.rm = TRUE),
na=sum(is.na(gender)))
答案 1 :(得分:0)
看起来你只想要
as.data.frame.matrix(table(df))
感谢:How to convert a table to a data frame
但要回答你关于你为什么得到输出的问题......
table
基于因子水平,而不是基于矢量中的值。所以,如果你运行
df[df$country=="Germany",]$country
[1] Germany
Levels: Germany Italy Poland USA
您可以看到,在子集化后,国家/地区矢量仍然具有所有四个级别,但只有一个值。然后,当您运行table
时,它会对每个级别进行汇总,即使它们不在向量中。
table(df[df$country=="Germany",])
gender
country female male
Germany 1 0
Italy 0 0
Poland 0 0
USA 0 0
调试ddply
时,请务必在其根据数据创建的子集之一上试用您的函数。
答案 2 :(得分:0)
由于您已经在plyr
,为什么不使用count
功能?
> library(plyr)
> count(df)
# country gender freq
# 1 Germany female 1
# 2 Italy male 2
# 3 Poland female 1
# 4 USA female 1
或者在基础R中,table
> ( tb <- table(df) )
# gender
# country female male
# Germany 1 0
# Italy 0 2
# Poland 1 0
# USA 1 0
ADDED :根据下面的OP评论,要将上表转换为数据框,您可以操作,使用和更改其属性。
> as.data.frame(cbind(country = rownames(tb), unclass(tb)),
row.names = "NULL")
# country female male
# 1 Germany 1 0
# 2 Italy 0 2
# 3 Poland 1 0
# 4 USA 1 0