我在R中有以下数据框:
c1 c2
1 10 a
2 20 a
3 30 b
4 40 b
我然后split
如下:z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)})
。然后是z
:
$a
[1] (9.99,15] (15,20]
Levels: (9.99,15] (15,20]
$b
[1] (30,35] (35,40]
Levels: (30,35] (35,40]
我想通过取消分割列表unsplit(z, test$c2)
来合并这些因素。这会产生警告:
[1] (9.99,15] (15,20] <NA> <NA>
Levels: (9.99,15] (15,20]
Warning message:
In `[<-.factor`(`*tmp*`, i, value = 1:2) :
invalid factor level, NAs generated
我想采用所有因子级别的联合然后解压缩,以便不会发生此错误:
z$a = factor(z$a, levels=c(levels(z$a), levels(z$b)))
unsplit(z, test$c2)
[1] (9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
在我的真实数据框中,我有一个非常大的列表,所以我需要迭代所有列表元素(不只是两个)。这样做的最佳方式是什么?
答案 0 :(得分:4)
你不能只是unlist()
z
吗?
> unlist(z)
a1 a2 b1 b2
(9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
或没有结果因素上的名称:
> unlist(z, use.names=FALSE)
[1] (9.99,15] (15,20] (30,35] (35,40]
Levels: (9.99,15] (15,20] (30,35] (35,40]
您可以将所有内容合并为一个不需要附加软件包的简单单行程序:
> (test2 <- within(test, newvar <- unlist(lapply(split(c1, c2), cut, 2))))
c1 c2 newvar
1 10 a (9.99,15]
2 20 a (15,20]
3 30 b (30,35]
4 40 b (35,40]
答案 1 :(得分:4)
如果我理解你的问题,我认为你使这个问题比需要的要复杂一些。这是使用plyr
的一个解决方案。我们将按c2
变量进行分组:
require(plyr)
ddply(test, "c2", transform, newvar = cut(c1, 2))
返回:
c1 c2 newvar
1 10 a (9.99,15]
2 20 a (15,20]
3 30 b (30,35]
4 40 b (35,40]
并具有以下结构:
'data.frame': 4 obs. of 3 variables:
$ c1 : num 10 20 30 40
$ c2 : Factor w/ 2 levels "a","b": 1 1 2 2
$ newvar: Factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4