Question

我正在尝试将seperate（tidyr）line_text分成单独的单词，因此每列一个单词：

数据：

structure(list(ID = c(140L, 233L, 233L), 
pdf_name = structure(c(1L, 
2L, 2L), .
Label = c("GBD2016_2_1255_Venezuela_MoH_Epi_2012_9.pdf", 
"GBD2016_2_1351_Venezuela_MoH_Epi_2014_44.pdf"), 
class = "factor"), 
keyword = c("SEGÚN GRUPOS", "SEGÚN GRUPOS", "SEGÚN GRUPOS"
), line_text = list("2000 Gráfico 2 . CASOS DE MALARIA SEGÚN GRUPOS DE EDAD Y SEXO,                                                                                                                         EPIDEMIOLÓGICA 9 Año 2012", 
    "GRÁFICO 2. CASOS DE MALARIA SEGÚN GRUPOS DE EDAD Y SEXO, HASTA", 
    "GRÁFICO 2. CASOS DE            SEGÚN GRUPOS"), 
.Names = c("ID", "pdf_name", "keyword", 
"page_num", "line_num", "line_text", "token_text"), row.names = c(NA, 
-3L), class = "data.frame")

已使用编码：

numcols<- make.unique(c(rep("word",10, sep  = " ")) )

df<- reportdiagn%>%
 (separate(reportdiagn$line_text,
        into = numcols, 
        sep = ("")))

我收到以下错误，无法解决该问题。

`Error in UseMethod("separate_") : 
 no applicable method for 'separate_' applied to an object of class "factor

Answer 1

您粘贴的数据不太正确。再试一次可能会很好-但无论如何，我都试图重现您的数据。可能不完全相同。我已经将linetext设置为字符串-但我认为下面的代码可以使用字符或因子。

在select()中，您不需要引用数据框-%>%已经做到了，您只需要不带引号的变量名称。另外，您的sep必须为空格或\\b作为单词边界。

ID <- c(140, 233, 233)
pdf_name <- factor(c(1, 2, 2),
    labels = c(
        "GBD2016_2_1255_Venezuela_MoH_Epi_2012_9.pdf", 
        "GBD2016_2_1351_Venezuela_MoH_Epi_2014_44.pdf") 
)
keyword <- c("SEGÚN GRUPOS", "SEGÚN GRUPOS", "SEGÚN GRUPOS")
line_text <- c("2000 Gráfico 2 . CASOS DE MALARIA SEGÚN GRUPOS DE EDAD Y SEXO, EPIDEMIOLÓGICA 9 Año 2012", 
               "GRÁFICO 2. CASOS DE MALARIA SEGÚN GRUPOS DE EDAD Y SEXO, HASTA", 
               "GRÁFICO 2. CASOS DE SEGÚN GRUPOS.")
reportdiagn <- data.frame(ID, pdf_name, keyword, line_text)

numcols<- make.unique(c(rep("word",10 )) )

df <- reportdiagn %>%
    separate(line_text,
              into = numcols, 
              sep = " ")

这会产生一些NA值（少于10个单词），并在更多单词时将其截断。我以为你在期待呢。

将因子分为R列

1 个答案: