我的数据如下所示:
id book_id numberofbook_id
1 ["19167120","237494310","195166798"] 3
2 ["19167120","237494310"] 2
3 [] 0
我要做的是首先要有一个不同的datafrane,它有不同的book_id作为单个
book_id
"19167120"
"237494310"
"195166798"
然后基于此,将id
分组 book_id id numberofid
"19167120" [1,2] 2
"237494310" [1,2] 2
"195166798" [1] 1
Ps:我想首先通过unlist
将所有单元格合并为一个,然后使用unique
函数获取唯一的单元格,然后将它们放入列中。但是{{1}在这里不是答案。
答案 0 :(得分:3)
将我的评论转换为答案,从下面定义的“df1”开始,您可以尝试以下方法:
library(splitstackshape)
Temp <- cSplit(as.data.table(df1)[, book_id := gsub("[][]", "", book_id)],
"book_id", ",", "long")
Temp <- na.omit(Temp, by = "book_id_new")
# id numberofbook_id book_id_new
# 1: 1 3 "19167120"
# 2: 1 3 "237494310"
# 3: 1 3 "195166798"
# 4: 2 2 "19167120"
# 5: 2 2 "237494310"
在上面的步骤中:
gsub
步骤只会从“book_id”列中删除[
和]
。cSplit
将数据拆分为长格式。na.omit
会删除不必要的NA
值。使用该表单中的数据,您现在可以根据需要轻松“聚合”数据。由于“Temp”的结果是data.table
,您可以继续使用“data.table”包。
Temp[, list(ID = paste(id, collapse = ","),
numofid = length(id)), by = "book_id_new"]
# book_id_new ID numofid
# 1: "19167120" 1,2 2
# 2: "237494310" 1,2 2
# 3: "195166798" 1 1
示例数据:
df1 <- structure(list(id = 1:3,
book_id = c("[\"19167120\", \"237494310 \",\"195166798\"]",
"[\"19167120\",\"237494310\"]", "[]"),
numberofbook_id = c(3L, 2L, 0L)),
.Names = c("id", "book_id", "numberofbook_id"),
class = "data.frame",
row.names = c(NA, -3L))