我有一个地址向量,就像这样:
address <- c("890 layton drive, wilmington de 19805",
"227 weehawken place suite 145, comstock ny 78956",
"13 airport highway, new castle de 19720",
"3640 New Hampshire Avenue NW Apt 207, Washington DC 20011").
如您所见,每个地址都包含诸如“ drive”,“ place”和“ suite”之类的词。我想通过各种字典矢量替换这些单词。我在mapvalues
包中使用dplyr
函数来创建自己的函数,如下所示:
sweet <- function(x) mapvalues(x, c("plaza", "street", "suite", "drive", "boulevard", "place",
"south", "north", "west", "east", "square", "avenue", "road",
"floor", "parkway", "circle", "highway"),
c("plz", "st", "ste", "dr", "blvd", "pl",
"s", "n", "w", "e", "sq", "ave", "rd",
"flr", "pkwy", "cir", "hwy"))
我想要的输出是
address <- c("890 layton dr, wilmington de 19805",
"227 weehawken pl ste 145, comstock ny 78956",
"13 airport hwy, new castle de 19720",
"3640 New Hampshire Ave NW, Washington DC 20011").
但是只要我应用该功能,
address <- sapply(address, sweet)
我得到了错误:
The following
来自values were not present in
x : plaza, street, suite, drive, boulevard, place, south, north, west, east, square, avenue, road, floor, parkway, circle, highway
我认为问题是因为mapvalues
正在寻找完全匹配的内容,例如将“ a”替换为“ A”有效,但不能将“ a是首字母”替换为有效。有没有解决的办法?解决方案不必放在dplyr
中,但是任何相当有效的方法都可以使用。任何建议表示赞赏。谢谢。
答案 0 :(得分:1)
检查stringr::str_replace_all
,您可以在其中传递命名向量进行多次替换:
patterns = c("plaza", "street", "suite", "drive", "boulevard", "place", "south", "north",
"west", "east", "square", "avenue", "road", "floor", "parkway", "circle",
"highway")
replacement = c("plz", "st", "ste", "dr", "blvd", "pl", "s", "n", "w", "e", "sq", "ave",
"rd", "flr", "pkwy", "cir", "hwy")
stringr::str_replace_all(address, setNames(replacement, patterns))
#[1] "890 layton dr, wilmington de 19805"
#[2] "227 weehawken pl ste 145, comstock ny 78956"
#[3] "13 airport hwy, new castle de 19720"
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"
要进一步忽略大小写并仅匹配完全匹配的单词,可以使用(?i)
修饰词和每个单词周围的单词边界:
stringr::str_replace_all(address, setNames(replacement, paste0('(?i)\\b', patterns, '\\b')))
#[1] "890 layton dr, wilmington de 19805"
#[2] "227 weehawken pl ste 145, comstock ny 78956"
#[3] "13 airport hwy, new castle de 19720"
#[4] "3640 New Hampshire Ave NW Apt 207, Washington DC 20011"