Question

我有一个数据框，其中的一列中包含字符串列表。我正在尝试在该列上使用unnest_tokens，以使每行具有一个令牌，但是当字符串在列表中时无法这样做。

数据框如下：

> dat
 department instructor_gender                                            comments
1        BME                 F is amazing and you will love her!, Prof. is so nice

我尝试过使用嵌套

dat.word <- dat %>%
  unnest_tokens(word, unlist(comments))

但得到：

Error in check_input(x) : 
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

我怎样才能取消嵌套此字符串列表，使每行只有一个单词？

编辑：

> dput(dat)
structure(list(department = "BME", instructor_gender = "F", comments = list(
    c("is amazing and you will love her!", "Prof. is so nice"
    ))), class = "data.frame", row.names = c(NA, -1L))

编辑2：所需的输出

> output
     word department instructor_gender
1      is        BME                 F
2 amazing        BME                 F
3     and        BME                 F
4     you        BME                 F

Answer 1

只需在之前使用tidyr::unnest：

df <- structure(list(department = "BME", instructor_gender = "F", comments = list(
  c("is amazing and you will love her!", "Prof. is so nice"
  ))), class = "data.frame", row.names = c(NA, -1L))

library(tidytext)
library(tidyverse)
df %>% unnest %>% unnest_tokens(word, comments)
#     department instructor_gender    word
# 1          BME                 F      is
# 1.1        BME                 F amazing
# 1.2        BME                 F     and
# 1.3        BME                 F     you
# 1.4        BME                 F    will
# 1.5        BME                 F    love
# 1.6        BME                 F     her
# 2          BME                 F    prof
# 2.1        BME                 F      is
# 2.2        BME                 F      so
# 2.3        BME                 F    nice

您说错了：

输入必须是任意长度的字符向量或字符列表向量，每个向量的长度为1

您向它提供了一个长度为2的字符矢量列表。

基本上，您可以为它提供一个字符串，或仅包含字符串的向量/列表。

将字符串嵌套到单个字符串

1 个答案: