所有,我已经四处搜索,找不到如何做到这一点的答案。我对R比较新,并且之前没有使用常规表达式,但基本上我将一些数据放入这样的字段中:
"#Route - 6 #Category - PARKING #Details - Parking issues#Result - MOVED ON #Vehicle Type - Mercedes "
我基本上希望能够将字符串拆分为不同的元素,因此#之后的每个类别都有自己的列。
我尝试使用tidyr包并且最初尝试过:
string %>% separate(Description, into = c("Route","Details","Result","License No",
"Vehicle Desciption"),
sep = "\n#", remove =F, extra = "drop")
但意识到我只想要“ - ”之后的数据。我尝试在代码中插入“ - ”但它不起作用。有谁知道如何在“ - ”和“#”之间理想地分割字符串。
非常感谢
答案 0 :(得分:5)
在一行中:
> gsub("^\\s+|\\s+$","",gsub(".*?[-]","",unlist(strsplit(str,"#"))))
[1] "" "6" "PARKING" "Parking issues" "MOVED ON" "Mercedes"
或分开以便更好地理解: 用“#”打破字符串:
a = unlist(strsplit(str,"#"))
删除“ - ”
之前的内容b = gsub(".*?[-]","",a)
删除前导和尾随空格:
gsub("^\\s+|\\s+$","",b)
答案 1 :(得分:4)
您可以执行以下操作:
strsplit(x, ' *#[^-]+- *')[[1]][2:6]
# [1] "6" "PARKING" "Parking issues" "MOVED ON" "Mercedes"
要提供您想要的列名,我想您可以执行以下操作:
mat <- matrix(strsplit(x, ' *#[^-]+- *')[[1]][2:6], ncol=5, byrow=T)
colnames(mat) <- c('Route', 'CAT', 'Details', 'Result', 'Vehicle Description')
# Route CAT Details Result Vehicle Description
# [1,] "6" "PARKING" "Parking issues" "MOVED ON" "Mercedes"
答案 2 :(得分:2)
使用str_extract
stringr
library(stringr)
str_extract_all(str1, '(?<=-\\s)\\w+(?:\\s*\\w+){0,}')[[1]]
#[1] "6" "PARKING" "Parking issues" "MOVED ON"
#[5] "Mercedes"
str_extract_all(str2, '(?<=-\\s)\\w+(?:\\s*\\w+){0,}')[[1]]
#[1] "6" "PARKING"
#[3] "Parking issues" "MOVED ON"
#[5] "Mercedes" "Parking issues are present"
#[7] "MOVED ON" "Mercedes"
str1 <- "#Route - 6 #Category - PARKING #Details - Parking issues#Result - MOVED ON #Vehicle Type - Mercedes "
str2 <- "#Route - 6 #Category - PARKING #Details - Parking issues#Result - MOVED ON #Vehicle Type - Mercedes #Details - Parking issues are present#Result - MOVED ON #Vehicle Type - Mercedes "