通过两个字符标记

时间:2015-07-02 15:36:31

标签: regex r string tidyr

所有,我已经四处搜索,找不到如何做到这一点的答案。我对R比较新,并且之前没有使用常规表达式,但基本上我将一些数据放入这样的字段中:

"#Route - 6 #Category - PARKING #Details - Parking issues#Result - MOVED ON #Vehicle Type - Mercedes "

我基本上希望能够将字符串拆分为不同的元素,因此#之后的每个类别都有自己的列。

我尝试使用tidyr包并且最初尝试过:

string %>% separate(Description, into  =  c("Route","Details","Result","License No",
                        "Vehicle Desciption"),
                sep = "\n#", remove =F, extra =  "drop")

但意识到我只想要“ - ”之后的数据。我尝试在代码中插入“ - ”但它不起作用。有谁知道如何在“ - ”和“#”之间理想地分割字符串。

非常感谢

3 个答案:

答案 0 :(得分:5)

在一行中:

> gsub("^\\s+|\\s+$","",gsub(".*?[-]","",unlist(strsplit(str,"#"))))
[1] ""               "6"              "PARKING"        "Parking issues" "MOVED ON"       "Mercedes"  

或分开以便更好地理解: 用“#”打破字符串:

a = unlist(strsplit(str,"#"))

删除“ - ”

之前的内容
b = gsub(".*?[-]","",a)

删除前导和尾随空格:

gsub("^\\s+|\\s+$","",b)

答案 1 :(得分:4)

您可以执行以下操作:

strsplit(x, ' *#[^-]+- *')[[1]][2:6]
# [1] "6"              "PARKING"        "Parking issues" "MOVED ON"       "Mercedes" 

要提供您想要的列名,我想您可以执行以下操作:

mat <- matrix(strsplit(x, ' *#[^-]+- *')[[1]][2:6], ncol=5, byrow=T)
colnames(mat) <- c('Route', 'CAT', 'Details', 'Result', 'Vehicle Description')

#      Route CAT       Details          Result     Vehicle Description
# [1,] "6"   "PARKING" "Parking issues" "MOVED ON" "Mercedes" 

答案 2 :(得分:2)

使用str_extract

中的stringr
library(stringr)
str_extract_all(str1, '(?<=-\\s)\\w+(?:\\s*\\w+){0,}')[[1]]
#[1] "6"              "PARKING"        "Parking issues" "MOVED ON"      
#[5] "Mercedes"      

 str_extract_all(str2, '(?<=-\\s)\\w+(?:\\s*\\w+){0,}')[[1]]
 #[1] "6"                          "PARKING"                   
 #[3] "Parking issues"             "MOVED ON"                  
 #[5] "Mercedes"                   "Parking issues are present"
 #[7] "MOVED ON"                   "Mercedes"                  

数据

str1 <- "#Route - 6 #Category - PARKING #Details - Parking issues#Result - MOVED ON #Vehicle Type - Mercedes "

str2 <- "#Route - 6 #Category - PARKING #Details - Parking issues#Result - MOVED ON #Vehicle Type - Mercedes #Details - Parking issues are present#Result - MOVED ON #Vehicle Type - Mercedes "