我有一个数据框,其中的一列中包含字符串列表。我正在尝试在该列上使用unnest_tokens
,以使每行具有一个令牌,但是当字符串在列表中时无法这样做。
数据框如下:
> dat
department instructor_gender comments
1 BME F is amazing and you will love her!, Prof. is so nice
我尝试过使用嵌套
dat.word <- dat %>%
unnest_tokens(word, unlist(comments))
但得到:
Error in check_input(x) :
Input must be a character vector of any length or a list of character
vectors, each of which has a length of 1.
我怎样才能取消嵌套此字符串列表,使每行只有一个单词?
编辑:
> dput(dat)
structure(list(department = "BME", instructor_gender = "F", comments = list(
c("is amazing and you will love her!", "Prof. is so nice"
))), class = "data.frame", row.names = c(NA, -1L))
编辑2:所需的输出
> output
word department instructor_gender
1 is BME F
2 amazing BME F
3 and BME F
4 you BME F
答案 0 :(得分:2)
只需在之前使用tidyr::unnest
:
df <- structure(list(department = "BME", instructor_gender = "F", comments = list(
c("is amazing and you will love her!", "Prof. is so nice"
))), class = "data.frame", row.names = c(NA, -1L))
library(tidytext)
library(tidyverse)
df %>% unnest %>% unnest_tokens(word, comments)
# department instructor_gender word
# 1 BME F is
# 1.1 BME F amazing
# 1.2 BME F and
# 1.3 BME F you
# 1.4 BME F will
# 1.5 BME F love
# 1.6 BME F her
# 2 BME F prof
# 2.1 BME F is
# 2.2 BME F so
# 2.3 BME F nice
您说错了:
输入必须是任意长度的字符向量或字符列表 向量,每个向量的长度为1
您向它提供了一个长度为2的字符矢量列表。
基本上,您可以为它提供一个字符串,或仅包含字符串的向量/列表。