如何取消每一行的列表,并在列表中采用唯一的元素和原始大小

时间:2019-09-23 06:06:22

标签: r dplyr data.table

我有一个列表作为行,现在我想取消列出该行中的所有元素并采用唯一元素。

library(data.table)
library(stringr)
Data<-data.table(
X=sample(1:10),
Y=list(c("between","between","before","pm"),c("am","in","at","am"),c("at","pm"),c("after","after","on"),c("on","am","on"),c("at","between","at"),c("at","between"),c("at","at","on"),c("pm","pm","am"),c("between","between","pm","between","pm","between","pm")))

现在,我要获得唯一元素以及列表中元素的数量。

例如,对于第一行,列表中存在4个元素,而“ beween”,“ before”,“ pm”是列表中的唯一元素。

所以我尝试了

Data[,unique_elements:=unique(Y),by=list(X)]
Data[,count:=length(Y),by=list(X)]

但是这两点并没有达到我的预期,也不知道我在哪里做错了。任何帮助表示赞赏。

2 个答案:

答案 0 :(得分:2)

我们可以使用lapply来获取每个unique的{​​{1}}值,并使用Y来获取lengths中每个元素的长度。

Y

但是,此解决方案并非专门针对library(data.table) Data[, c("unique_vals", "count") := list(lapply(Y, unique), lengths(Y))] Data # X Y unique_vals count #1: 10 between,between,before,pm between,before,pm 4 #2: 4 am,in,at,am am,in,at 4 #3: 3 at,pm at,pm 2 #4: 6 after,after,on after,on 3 #5: 5 on,am,on on,am 3 #6: 1 at,between,at at,between 3 #7: 8 at,between at,between 2 #8: 7 at,at,on at,on 3 #9: 9 pm,pm,am pm,am 3 #10: 2 between,between,pm,between,pm,between,... between,pm 7 ,我们可以使用data.table

dplyr

或基数R:

library(dplyr)
Data %>%
  mutate(unique_vals = purr::map(Y, unique), 
         count = lengths(Y))

答案 1 :(得分:1)

data.table结果

lapply(Data$Y,unique)

获取唯一的字符串,并且

lapply(Data$Y,length)

获取列表中的元素数量。