从地址字符串中提取城市名称

时间:2020-03-05 13:49:46

标签: r excel

是否有一种方法可以从有时不一致的地址字符串中提取城市名称。大多数情况下,城市名称是字符串的最后一个字,但并非总是如此。例如:

Streetname 8, 1234 AA, Amsterdam
Streetname 10, 1234 BB, Rotterdam ZH

因此,我只想检查地址字符串是否包含〜10个可能的城市名称之一。如果是这样,则必须在新列中打印该特定城市名称。 谁可以在R的任一Excel中帮助我实现这一目标?

提前谢谢!

3 个答案:

答案 0 :(得分:3)

R中:

df = data.frame(Adres = c('Streetname 8, 1234 AA, Amsterdam','Streetname 10, 1234 BB, Rotterdam ZH'))
df$Stad <- stringr::str_extract(df$Adres, "(?<=, )[A-Za-z]+")
print(df)

打印:

                                     Adres          Stad
1         Streetname 8, 1234 AA, Amsterdam     Amsterdam
2     Streetname 10, 1234 BB, Rotterdam ZH     Rotterdam

在线demo


这可以在您的城市名称为单个单词的情况下使用。如果您有“ Den Bosch”或“ s-Hertogenbosch”之类的城市,则可以使用另一种模式:

(?<=, )\D+?(?=( [A-Z]*)?$)

Regular expression visualization

例如,这可能导致:

                                         Adres             Stad
1             Streetname 8, 1234 AA, Amsterdam        Amsterdam
2         Streetname 10, 1234 BB, Rotterdam ZH        Rotterdam
3 Streetname 10, 1234 BB, 's-Hertogenbosch BRA 's-Hertogenbosch
4        Streetname 10, 1234 BB, Den Bosch BRA        Den Bosch

如果要排除某些城市,则可以构建将包括个城市作为OR语句的模式,例如:

(?<=, )(Rotterdam|Amsterdam|Den Bosch|'s-Hertogenbosch)(?=.*$)

Regular expression visualization

这可能会导致:

                                         Adres             Stad
1             Streetname 8, 1234 AA, Amsterdam        Amsterdam
2         Streetname 10, 1234 BB, Rotterdam ZH        Rotterdam
3            Streetname 19, 1234 CC, Almere FL             <NA>
4 Streetname 10, 1234 BB, 's-Hertogenbosch BRA 's-Hertogenbosch
5        Streetname 10, 1234 BB, Den Bosch BRA        Den Bosch

答案 1 :(得分:0)

您可以使用VBA Split()函数,如以下简单示例所示:

Sub test()
Dim Temp As String
Dim LArray() As String
Dim LLArray() As String
Dim Result As String

  Temp = Range("B2").Value
  LArray = Split(Trim(Temp), ",")         # Split the address (the city name seems to be the third entry
                                          # Be cautious: the first item of an array is A(0), the second is A(1) and the third is A(2)
  LLArray = Split(Trim(LArray(2)), " ")   # Once you have the cityname, it may consist of one name
                                          # or it consists of the name, followed by something else, and you just need the first entry (A(0))
  Result = LLArray(0)

End Sub

答案 2 :(得分:0)

将此VBA代码放在常规模块中:

Public Function ExtractCity(Target As Range) As String

    Dim Cities As Variant
    Dim City As Variant

    Cities = ThisWorkbook.Worksheets("Sheet1").Range("A1:A10")

    For Each City In Cities
        If InStr(Target, City) > 0 Then
            ExtractCity = City
            Exit For
        End If
    Next City

End Function 
  • 在名为A1:A10的工作表上,将您的10个城市放在Sheet1范围内。
  • 将您的地址添加到单元格D1:D2中。
  • 在单元格E1中输入以下公式:=ExtractCity(D1)