我需要从地址变量中提取字符串的一部分。我的数据看起来像这样
"
[45] "Matara Road, Habaraduwa | Talpe, Unawatuna, Galle GL 80630, Sri Lanka "
[46] "Jungle Beach Road, Buonavista | Rumassala, Unawatuna, Galle 80600, Sri Lanka "
[47] "10 Church Street | inside the Fort, Galle, Sri Lanka "
[48] "78 Mile Post Matara Road Mihiripenna, Unawatuna, Galle 80615, Sri Lanka "
[49] "No: 288 Galle Road | Dadella, Galle 80000, Sri Lanka "
[50] "Matara Road, Koggala, Galle, Sri Lanka "
我想从这个字符串中提取城市,在这种情况下应该是“Galle”。我能想到的唯一模式是它出现在“斯里兰卡”之前。或者城市位于“,”和“斯里兰卡”之间。这是我使用的代码
gsub("\\.s*|(, Sri Lanka).*", "", a)
但是使用此代码我得到以下结果。
[45] "Matara Road, Habaraduwa | Talpe, Unawatuna, Galle GL 80630"
[46] "Jungle Beach Road, Buonavista | Rumassala, Unawatuna, Galle 80600"
[47] "10 Church Street | inside the Fort, Galle"
[48] "78 Mile Post Matara Road Mihiripenna, Unawatuna, Galle 80615"
[49] "No: 288 Galle Road | Dadella, Galle 80000"
[50] "Matara Road, Koggala, Galle"
无论如何只保留城市
答案 0 :(得分:1)
n <- c(
"Matara Road, Habaraduwa | Talpe, Unawatuna, Galle GL 80630, Sri Lanka " ,
"Jungle Beach Road, Buonavista | Rumassala, Unawatuna, Galle 80600, Sri Lanka ",
"10 Church Street | inside the Fort, Galle, Sri Lanka " ,
"78 Mile Post Matara Road Mihiripenna, Unawatuna, Galle 80615, Sri Lanka " ,
"No: 288 Galle Road | Dadella, Galle 80000, Sri Lanka " ,
"Matara Road, Koggala, Galle, Sri Lanka " )
首先,您要提取带有可能的州名和可能的邮政编码&gt;
的城市名称m <- sub('.*, (.*), Sri Lanka *$', '\\1', n)
m
现在是:
[1]“Galle GL 80630”“Galle 80600”“Galle”“Galle 80615”“Galle 80000”“Galle”
解压缩邮政编码
l <- sub(' \\d{5} *$', '', m )
l
是:
[1]“Galle GL”“Galle”“Galle”“Galle”“Galle”“Galle”
最后,提取州名缩写
sub('( \\w{2})$', '', l)
[1]“Galle”“Galle”“Galle”“Galle”“Galle”“Galle”
答案 1 :(得分:0)
我会改用strsplit:
line <- "Matara Road, Habaraduwa | Talpe, Unawatuna, Galle GL"
array <- strsplit(line,",")[[1]]
city <- array[length(array)-1]
试试吧!
摆脱数字只需要城市并用gsub删除它们。希望它有所帮助!
答案 2 :(得分:0)
您可以编写一个函数来以逗号分割字符串,并采用通常为城市名称的倒数第二个元素。
myfunction=function(x)
{
x=strsplit(x,",")[[1]][length(unlist(strsplit(x,",")))-1]
x=gsub("[[:digit:]]","",x )
}
这个功能完成了这项工作。此外,它然后删除任何数字/数字。
现在在lapply
函数中使用它来获得所需的输出
lapply(x,myfunction)