使用plyr包汇总我的数据时,似乎不可能使用nlevels()函数。
我的数据集结构如下:
>aer <- read.xlsx("XXXX.xlsx", sheetIndex=1)
>aer$ID <- as.factor(aer$ID)
>aer$description <- as.factor(aer$description)
>head(aer)
ID SOC start end days count severity relation
1 1 410 2015-04-21 2015-04-28 7 1 1 3
2 1 500 2015-01-30 2015-05-04 94 1 1 3
3 1 600 2014-11-25 2014-11-29 4 1 1 3
4 1 600 2015-01-02 2015-01-07 5 1 1 3
5 1 600 2015-01-26 2015-03-02 35 1 1 3
6 1 600 2015-04-14 2015-04-17 3 1 1 3
> dput(head(aer,4))
structure(list(ID = structure(c(1L, 1L, 1L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "13", "14",
"15"), class = "factor"), SOC = c(410, 500, 600, 600),
start = structure(c(16546, 16465, 16399, 16437), class = "Date"),
end = structure(c(16553, 16559, 16403, 16442), class = "Date"),
days = c(7, 94, 4, 5), count = c(1, 1, 1, 1), severity = c(1,
1, 1, 1), relation = c(3, 3, 3, 3)), .Names = c("ID", "SOC",
"description", "start", "end", "days", "count", "severity", "relation"
), row.names = c(NA, 4L), class = "data.frame")
我想知道的是&#34; ID&#34;在使用变量&#34; SOC&#34;来划分数据集时创建的数据部分中的变量。我想将这些信息与新数据集中的一些其他变量一起汇总。因此,我想像这样使用plyr包:
summaer2 <- ddply(aer, c("SOC"), summarise,
participants = nlevels(ID),
events = sum(count),
min_duration = min(days),
max_duration = max(days),
max_severity = max(severity))
这会返回以下错误:
Error in Summary.factor(c(4L, 5L, 11L, 11L, 14L, 14L), na.rm = FALSE) :
‘max’ not meaningful for factors
有人可以就如何实现目标向我提出建议吗?或者我做错了什么?
非常感谢提前!
答案 0 :(得分:0)
<强>更新强>
用nlevels(ID)
代替length(unique(ID))
似乎给了我想要的输出:
> head(summaer2)
SOC participants events min_duration max_duration max_severity
1 100 4 7 1 62 2
2 410 9 16 1 41 2
3 431 2 2 109 132 1
4 500 5 9 23 125 2
5 600 8 19 1 35 1
6 1040 1 1 98 98 2