我有一个包含三列的数据框
SentenceID = c(1,1,1,1,2,2,2,3,3,3,3,3,3,3,3)
Tokens = c("I","went","to","school","nobody","can","find","some","people","know","what","they","are","doing","now")
WordIndex = c(3,4,7,8,9,10,12,54,34,66,33,89,87,23,22)
df = data.frame(SentenceID, Tokens, WordIndex)
期望的结果:
我必须遍历每个SentenceID并创建一个向量X列表,如下所示
X
[[1]] 3 4 7 8
[[2]] 9 10 12
[[3]] 54 34 66 33 89 87 23 22
然后我需要用0填充它们10个斑点
X
[[1]] 3 4 7 8 0 0 0 0 0 0 0
[[2]] 9 10 12 0 0 0 0 0 0 0
[[3]] 54 34 66 33 89 87 23 22 0 0
我该如何做到这一点?
答案 0 :(得分:1)
这是一种方法:
> lapply(split(df$WordIndex, df$SentenceID), function(x) c(x, rep(0, pmax(10 - length(x), 0))))
$`1`
[1] 3 4 7 8 0 0 0 0 0 0
$`2`
[1] 9 10 12 0 0 0 0 0 0 0
$`3`
[1] 54 34 66 33 89 87 23 22 0 0
答案 1 :(得分:1)
带有aggregate
的基础R解决方案:
lapply(aggregate(WordIndex, list(SentenceID), c)$x,
function(X) head(c(X, rep(0,10)), 10))
$`1`
[1] 3 4 7 8 0 0 0 0 0 0
$`2`
[1] 9 10 12 0 0 0 0 0 0 0
$`3`
[1] 54 34 66 33 89 87 23 22 0 0
答案 2 :(得分:0)
您可以使用tidyverse
的{{1}}功能尝试purrr
map