Question

我的问题源于先前提出的this和this问题，我认为这些问题与他们有很大不同。想象一下，我有一个最小的数据集（鸟），其中每一行代表在给定时间和地点观察鸟类如下：

id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null

我理想的是将此数据集转换为这样的形式，从数据集中删除空值：

id,tot_obs,country_tot
A,12,2
B,3,1
C,5,2

我知道我可以使用以下方法获得一些因素：

table(bird$country)

但是，是否有一种更智能的，可能是一种方法来删除空值，将总计数加起来，查找国家/地区的计数然后将它们重新配置为这种形式？如果有一个包这样做，那么我也对这个建议持开放态度。谢谢！

Answer 1

使用stringsAsFactors=FALSE加载数据：

df <- read.csv(header=TRUE, text="id,obs,country
A,4,USA
B,3,CAN
A,5,USA
C,4,MEX
C,1,USA
A,3,CAN
D,1,null", stringsAsFactors=FALSE)

# check to see if columns are factors
sapply(df, class)
#          id         obs     country 
# "character"   "integer" "character"

使用country = null

删除所有行

df <- df[df$country != "null", ]

然后，您可以使用plyr包与summarise来获得所需的结果，如下所示：

ddply(df, .(id), summarise, tot_obs=sum(obs), tot_country=length(unique(country)))
#   id tot_obs tot_country
# 1  A      12           2
# 2  B       3           1
# 3  C       5           2

在R数据帧中匹配和添加因子计数

1 个答案: