我有一个包含2列的数据集:唯一ID和注释。 我只能用注释形成一个文字云,但我希望我能保留每个文本的唯一ID,这样当我在Tableau中可视化结果时我就可以重新加入它。
实施例
ID | Text
a1 This is a test comment.
a2 Another test comment.
a3 This is very good
a4 I like this.
我希望的输出是:
ID | Words
--
a1 This
a1 is
a1 a
a1 test
a1 comment
a2 Another
a2 test
a2 comment
a3 This
a3 is
a3 very
a3 good.
我希望你能得到我的样品。 谢谢你
Ĵ
答案 0 :(得分:2)
> df <- read.table(text='ID Text
+ a1 "This is a test comment"
+ a2 "Another test comment"
+ a3 "This is very good"
+ a4 "I like this"', header=TRUE, as.is=TRUE)
>
>
> library(data.table)
> dt = data.table(df)
> dt[,c(Words=strsplit(Text, " ", fixed = TRUE)), by = ID]
ID Words
1: a1 This
2: a1 is
3: a1 a
4: a1 test
5: a1 comment
6: a2 Another
7: a2 test
8: a2 comment
9: a3 This
10: a3 is
11: a3 very
12: a3 good
13: a4 I
14: a4 like
15: a4 this
答案 1 :(得分:1)
您可以执行类似
的操作library(tidyverse)
df<- tribble(
~ID, ~Text,
"a1", "This is a test comment.",
"a2", "Another test comment.",
"a3", "This is very good",
"a4", "I like this."
)
split_data <- strsplit(df$Text, " ")
do.call(rbind,
lapply(seq_along(unique(df$ID)), function(x) {
cbind(rep(df$ID[x], length(split_data[[x]])), split_data[[x]])
})
)