Question

我再一次与strsplit挣扎。我正在将一些字符串转换为数据帧，但是有一个正斜杠，/和我的字符串中的一些空白区域一直在困扰着我。我可以解决它，但我渴望了解我是否可以使用某些幻想或strsplit。我下面的工作示例应说明问题

strsplit函数我正在使用

str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }

我得到的一种字符串，

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#>      [,1]    [,2]  
#> [1,] "One"   "58/2"
#> [2,] "Two"   "22/3"
#> [3,] "Three" "15/5"

我在同一地点的另一种类型，

string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#>      [,1]    [,2] [,3] [,4]
#> [1,] "One"   "58" "/"  "2" 
#> [2,] "Two"   "22" "/"  "3" 
#> [3,] "Three" "15" "/"  "5"

他们显然会创建不同的输出，我无法弄清楚如何编写适用于两者的解决方案。以下是我的期望结果。先感谢您！

desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
                               "15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#>      [,1]    [,2] [,3]
#> [1,] "One"   "58" "2" 
#> [2,] "Two"   "22" "3" 
#> [3,] "Three" "15" "5"

Answer 1

这有效：

str_to_df <- function(string){
  t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')

str_to_df(string1)
#      [,1]    [,2] [,3]
# [1,] "One"   "58" "2" 
# [2,] "Two"   "22" "3" 
# [3,] "Three" "15" "5"

str_to_df(string2)
#      [,1]    [,2] [,3]
# [1,] "One"   "58" "2" 
# [2,] "Two"   "22" "3" 
# [3,] "Three" "15" "5"

tidyr的另一种方法可能是：

string1 %>% 
  as_tibble() %>% 
  separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")

# A tibble: 3 x 3
#   Col1  Col2  Col3 
#   <chr> <chr> <chr>
# 1 One   58    2    
# 2 Two   22    3    
# 3 Three 15    5

Answer 2

我们可以在一个或多个空格或制表符或正斜杠

创建split的函数

f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
#    [,1]    [,2] [,3]
#[1,] "One"   "58" "2" 
#[2,] "Two"   "22" "3" 
#[3,] "Three" "15" "5" 

f1(string2)
#     [,1]    [,2] [,3]
#[1,] "One"   "58" "2" 
#[2,] "Two"   "22" "3" 
#[3,] "Three" "15" "5"

或者，在使用公共分隔符

替换空格后，我们可以使用read.csv

read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
#     V1 V2 V3
#1   One 58  2
#2   Two 22  3
#3 Three 15  5

strsplit拆分或依赖于

2 个答案: