Question

我正在尝试在R中拆分单行文本并将它们存储在数据框中。

例如。如下文字：

from django.conf import settings user = models.OneToOneField(settings.AUTH_USER_MODEL, related_name='profile',on_delete=models.CASCADE)

预计将成为：

hello-world;1|(good)night world;2|...

为了达到这个目的：我首先在'\'上分割初始文本。出于这个原因，我使用V1 V2 hello-world 1 (good)night world 2分开。

tidyr

我怀疑在第一次分裂中，问题随library(tidyr) as.data.frame(str) %>% separate(str, into=c("V1"), sep='\\|') 1 hello-world;1 #Warning message: #Too many values at 1 locations: 1而上升。我该如何解决这个问题？

Answer 1

这个怎么样？

library(tidyverse)

text <- c("hello-world;1|(good)night world;2")

df_text <- data.frame(a = unlist(strsplit(text, "|", fixed = T)))

df_split_text <- separate(df_text, a, c("V1", "V2"), sep = ";")

Answer 2

如果您想通过tidyverse执行此操作，则需要使用unnest将其设为长，然后使用separate值。

libraary(tidyverse)

data.frame(v1 = 'hello-world;1|(good)night world;2|') %>% 
       mutate(v1 = strsplit(as.character(v1), '\\|')) %>% 
       unnest(v1) %>% 
       separate(v1, into = c('v1', 'v2'), sep = ';')

# A tibble: 2 x 2
#                 v1    v2
#*             <chr> <chr>
#1       hello-world     1
#2 (good)night world     2

Answer 3

我们知道@ udden2903已经使用tidyverse给出了最佳答案，但此base R也应该有效。将|替换为\n，然后使用read.table

进行阅读

read.table(text=gsub("[|]", "\n", text), header = FALSE, sep=";", stringsAsFactors= FALSE)
#                 V1 V2
#1       hello-world  1
#2 (good)night world  2

将字符串拆分为数据框

3 个答案: