将字符串嵌套到单个字符串

时间:2018-12-02 01:38:11

标签: r tidyverse

我有一个数据框,其中的一列中包含字符串列表。我正在尝试在该列上使用unnest_tokens,以使每行具有一个令牌,但是当字符串在列表中时无法这样做。

数据框如下:

> dat
 department instructor_gender                                            comments
1        BME                 F is amazing and you will love her!, Prof. is so nice

我尝试过使用嵌套

dat.word <- dat %>%
  unnest_tokens(word, unlist(comments))

但得到:

Error in check_input(x) : 
  Input must be a character vector of any length or a list of character
  vectors, each of which has a length of 1.

我怎样才能取消嵌套此字符串列表,使每行只有一个单词?

编辑:

> dput(dat)
structure(list(department = "BME", instructor_gender = "F", comments = list(
    c("is amazing and you will love her!", "Prof. is so nice"
    ))), class = "data.frame", row.names = c(NA, -1L))

编辑2:所需的输出

> output
     word department instructor_gender
1      is        BME                 F
2 amazing        BME                 F
3     and        BME                 F
4     you        BME                 F

1 个答案:

答案 0 :(得分:2)

只需在之前使用tidyr::unnest

df <- structure(list(department = "BME", instructor_gender = "F", comments = list(
  c("is amazing and you will love her!", "Prof. is so nice"
  ))), class = "data.frame", row.names = c(NA, -1L))

library(tidytext)
library(tidyverse)
df %>% unnest %>% unnest_tokens(word, comments)
#     department instructor_gender    word
# 1          BME                 F      is
# 1.1        BME                 F amazing
# 1.2        BME                 F     and
# 1.3        BME                 F     you
# 1.4        BME                 F    will
# 1.5        BME                 F    love
# 1.6        BME                 F     her
# 2          BME                 F    prof
# 2.1        BME                 F      is
# 2.2        BME                 F      so
# 2.3        BME                 F    nice

您说错了:

  

输入必须是任意长度的字符向量或字符列表   向量,每个向量的长度为1

您向它提供了一个长度为2的字符矢量列表。

基本上,您可以为它提供一个字符串,或仅包含字符串的向量/列表。