我有一个地址列表,其中包含(1)门牌号和(2)建筑物名称。我希望将字符串分成两列。棘手的部分是一些门牌号包含字符,例如贝克街221B号。
以下示例:
add <- c("5 Ark Royal House" ,
"22A Blington Garden Lincoln Street",
"Flat 19 PICTON HOUSE" ,
"2-3 Royal Albert Court" ,
"Room 1 Grand Hall",
"No 17 The Dell Alpha House")
理想的结果如下:
aim <- data.frame("No"=as.character(c("5", "22A", "Flat 19", "2-3", "Room 1", "No 17")),
"Building" = as.character(c("Ark Royal House",
"Blington Garden Lincoln Street" ,
"PICTON HOUSE",
"Royal Albert Court" ,
"Grand Hall" ,
"The Dell Alpha House")))
答案 0 :(得分:3)
使用stringr
:
library(stringr)
lst <- str_match_all(add, "^(\\D*\\d[-\\w]*)\\s+(.+)")
(aim <- setNames(as.data.frame(do.call(rbind, lst)),
c("all", "No", "Building")))
或者在香草R中:
pattern <- "^(\\D*\\d[-\\w]*)\\s+(.+)"
lst <- regmatches(add, regexec(pattern, add, perl = T))
(aim <- setNames(as.data.frame(do.call(rbind, lst)),
c("all", "No", "Building")))
all No Building
1 5 Ark Royal House 5 Ark Royal House
2 22A Blington Garden Lincoln Street 22A Blington Garden Lincoln Street
3 Flat 19 PICTON HOUSE Flat 19 PICTON HOUSE
4 2-3 Royal Albert Court 2-3 Royal Albert Court
5 Room 1 Grand Hall Room 1 Grand Hall
6 No 17 The Dell Alpha House No 17 The Dell Alpha House
请参阅regex101.com上的a demo for the expression。
答案 1 :(得分:1)
基本方法,找到数字和名称之间的间隙,将其替换为希望的中性字符(在本例中为_
,但可能是您知道的任何不在地址中的字符),然后拆分该字符。
它假定包含数字的最后一个“单词”是“否”部分的结尾。如果对于您的所有地址(对于您的所有测试用例)都不是正确的,那么这将无效。
add <- c("5 Ark Royal House" ,
"22A Blington Garden Lincoln Street",
"Flat 19 PICTON HOUSE" ,
"2-3 Royal Albert Court" ,
"Room 1 Grand Hall",
"No 17 The Dell Alpha House")
split_add <- strsplit(gsub('([0-9\\-]+[0-9A-z]*) ', '\\1_', add), split='_')
aim <- setNames(as.data.frame(do.call(rbind, split_add)),
c('No', 'Building'))
aim
#> No Building
#> 1 5 Ark Royal House
#> 2 22A Blington Garden Lincoln Street
#> 3 Flat 19 PICTON HOUSE
#> 4 2-3 Royal Albert Court
#> 5 Room 1 Grand Hall
#> 6 No 17 The Dell Alpha House
由reprex package(v0.2.1)于2019-02-19创建