以下列表“ls”包含三个数据框:
unigrams = data.frame(freq = c(3, 3, 5, 4, 3, 41),
term = c("a-list", "a-p", "aaa", "aam", "aamir", "aaron"))
bigrams = data.frame(freq = c(13, 1, 1, 2, 1, 4),
term = c("a a", "a abode", "a about", "a absolutely", "a accessory", "a acre"))
trigrams = data.frame(freq = c(1, 1, 1, 1, 1, 1),
term = c("a a card", "a a divorce", "a a dreamer", "a a great", "a a guy", "a a hand"))
ls = list(unigrams, bigrams, trigrams)
这给了我们这个:
[[1]]
freq term
1 3 a-list
2 3 a-p
3 5 aaa
4 4 aam
5 3 aamir
6 41 aaron
[[2]]
freq term
1 13 a a
2 1 a abode
3 1 a about
4 2 a absolutely
5 1 a accessory
6 4 a acre
[[3]]
freq term
1 1 a a card
2 1 a a divorce
3 1 a a dreamer
4 1 a a great
5 1 a a guy
6 1 a a hand
我想将每个数据框中的“term”列与单词数分开,创建列“word1”,“word2”,“word3”。像这样:
freq word1
1 3 a-list
2 3 a-p
3 5 aaa
4 4 aam
5 3 aamir
6 41 aaron
freq word1 word2
1 13 a a
2 1 a abode
3 1 a about
4 2 a absolutely
5 1 a accessory
6 4 a acre
freq word1 word2 word3
1 1 a a card
2 1 a a divorce
3 1 a a dreamer
4 1 a a great
5 1 a a guy
6 1 a a hand
我的尝试:
new_ls = list()
for (i in length(ls)) {
x = ls[[i]]
# Split each word in column "term":
x[,paste("word", 1:i, sep = "")] = as.character(lapply(strsplit(as.character(x$term), split=" "), "[", i))
x = subset(x, select = -term)
new_ls[[i]] = x
}
不幸的是,最后一个片段只在最后一个元素中存储了一些错误的结果:
[[1]]
NULL
[[2]]
NULL
[[3]]
freq word1 word2 word3
1 1 card card card
2 1 divorce divorce divorce
3 1 dreamer dreamer dreamer
4 1 great great great
5 1 guy guy guy
6 1 hand hand hand
我做错了什么?
答案 0 :(得分:1)
splitstackshape
库使这项任务变得简单,
library(splitstackshape)
lapply(ls, function(i) cSplit(i, 'term', sep = ' ', direction = 'wide'))