I have a large data.table
where one column contains text, here is a simple example:
x = data.table(text = c("This is the first text", "Second text"))
I would like to get a data.table
with one column containing all the words of all the texts. Here was my try:
x[, strsplit(text, " ")]
text
1: This is the first text
2: Second text
Which results in:
V1 V2
1: This Second
2: is text
3: the Second
4: first text
5: text Second
The result I would like to get is:
text
1: This
2: is
3: the
4: first
5: text
6: Second
7: text
答案 0 :(得分:3)
You are close and looking for:
data.table(text=unlist(strsplit(x$text, " ")))
# text
#1: This
#2: is
#3: the
#4: first
#5: text
#6: Second
#7: text
答案 1 :(得分:2)
正如@Henrik在评论中提到的那样,您可以使用cSplit
包中的splitstackshape
执行此任务:
library(splitstackshape)
cSplit(x, "text", sep = " ", direction = "long")
给出了:
# text
#1: This
#2: is
#3: the
#4: first
#5: text
#6: Second
#7: text
您还可以创建一个列来帮助识别结果中的初始句子:
x %>% dplyr::mutate(n = 1:n()) %>% cSplit(., "text", " ", "long")
给出了:
# text n
#1: This 1
#2: is 1
#3: the 1
#4: first 1
#5: text 1
#6: Second 2
#7: text 2