我有一个矩阵列表,在列id
中有重复的值。如何在所有列表元素中拆分重复项?
我使用data.frames
的方式是使用lapply
+ split
+ duplicated
,但这不适用于矩阵,因为它们也被分成数字。我想保留矩阵结构。
## Data.frame - all good
df <- data.frame(
id = rep(1:10, each = 2),
val = rep(10, each = 20)
)
df_list <- rep(list(df), 2);
lapply(df_list, function(x){split(x, duplicated(x[,'id']))$'FALSE'})
## Matrix - Here's my problem
mt <- as.matrix(data.frame(
id = rep(seq(1,10,1), each = 2),
val = rep(10, each = 20)
))
mt_list <- rep(list(mt), 2)
lapply(mt_list, function(x){split(x, duplicated(x[,'id']))$'FALSE'})
答案 0 :(得分:1)
在编写问题并摆弄代码的同时,我想出了一个解决方案。 由于我没有找到有关此特定设置的任何内容,因此无论如何我都会将其发布。
功能subset
/ subset.matrix
起作用:
lapply(mt_list, function(x){subset.matrix(x, !duplicated(x[,'id']))})
我对不同的选择进行了基准测试; subset.matrix
似乎比subset
快一点。
mt <- as.matrix(data.frame(
id = rep(seq(1,1000,1), each = 2),
val = rep(1000, each = 20)
))
mt_list <- rep(list(mt), 50)
mc <- microbenchmark::microbenchmark(
subset = lapply(mt_list, function(x){subset(x, !duplicated(x[,'id']))}),
subset.matrix = lapply(mt_list, function(x){subset.matrix(x, !duplicated(x[,'id']))}),
split = lapply(mt_list, function(x){matrix(split(x, duplicated(x[,'id']))$'FALSE', ncol = 2)}),
unique = lapply( mt_list, unique )
)
mc
Unit: milliseconds expr min lq mean median uq max neval cld subset 3.758708 3.862849 4.256363 3.900580 3.981629 9.713416 100 a subset.matrix 3.583632 3.700450 4.174137 3.729881 3.821947 9.611992 100 a split 32.630604 33.061503 34.535531 33.262841 33.726039 77.531039 100 b unique 144.832487 148.408874 155.099591 150.326865 155.456601 430.992916 100 c
答案 1 :(得分:1)
也许尝试
split(df,ave(df$id, df$id, FUN= function(x) seq_along(x)))
$`1`
id val
1 1 10
3 2 10
5 3 10
7 4 10
9 5 10
11 6 10
13 7 10
15 8 10
17 9 10
19 10 10
$`2`
id val
2 1 10
4 2 10
6 3 10
8 4 10
10 5 10
12 6 10
14 7 10
16 8 10
18 9 10
20 10 10