strsplit拆分或依赖于

时间:2018-04-23 15:28:28

标签: r dataframe strsplit

我再一次与挣扎。我正在将一些字符串转换为数据帧,但是有一个正斜杠,/和我的字符串中的一些空白区域一直在困扰着我。我可以解决它,但我渴望了解我是否可以使用某些幻想或。我下面的工作示例应说明问题

函数我正在使用

str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }

我得到的一种字符串,

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#>      [,1]    [,2]  
#> [1,] "One"   "58/2"
#> [2,] "Two"   "22/3"
#> [3,] "Three" "15/5"

我在同一地点的另一种类型,

string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#>      [,1]    [,2] [,3] [,4]
#> [1,] "One"   "58" "/"  "2" 
#> [2,] "Two"   "22" "/"  "3" 
#> [3,] "Three" "15" "/"  "5" 

他们显然会创建不同的输出,我无法弄清楚如何编写适用于两者的解决方案。以下是我的期望结果。先感谢您!

desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
                               "15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#>      [,1]    [,2] [,3]
#> [1,] "One"   "58" "2" 
#> [2,] "Two"   "22" "3" 
#> [3,] "Three" "15" "5"

2 个答案:

答案 0 :(得分:6)

这有效:

str_to_df <- function(string){
  t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }

string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')

str_to_df(string1)
#      [,1]    [,2] [,3]
# [1,] "One"   "58" "2" 
# [2,] "Two"   "22" "3" 
# [3,] "Three" "15" "5"

str_to_df(string2)
#      [,1]    [,2] [,3]
# [1,] "One"   "58" "2" 
# [2,] "Two"   "22" "3" 
# [3,] "Three" "15" "5"

tidyr的另一种方法可能是:

string1 %>% 
  as_tibble() %>% 
  separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")

# A tibble: 3 x 3
#   Col1  Col2  Col3 
#   <chr> <chr> <chr>
# 1 One   58    2    
# 2 Two   22    3    
# 3 Three 15    5 

答案 1 :(得分:5)

我们可以在一个或多个空格或制表符或正斜杠

创建split的函数
f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
#    [,1]    [,2] [,3]
#[1,] "One"   "58" "2" 
#[2,] "Two"   "22" "3" 
#[3,] "Three" "15" "5" 

f1(string2)
#     [,1]    [,2] [,3]
#[1,] "One"   "58" "2" 
#[2,] "Two"   "22" "3" 
#[3,] "Three" "15" "5" 

或者,在使用公共分隔符

替换空格后,我们可以使用read.csv
read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
#     V1 V2 V3
#1   One 58  2
#2   Two 22  3
#3 Three 15  5