R:数据质量检查:邮政编码与城市匹配

时间:2018-12-12 09:50:58

标签: r data-quality

有人可以帮助我在R中实现一个想法吗?

我想实现的是,当R获得一个输入文件时,例如公司及其地址的列表,它将检查每个公司的邮政编码是否适合城市。我列出了某个国家/地区的所有城市和邮政编码。如何将列表实现为if语句?

以前有人编程过类似的东西吗?

感谢您的帮助! 桑德拉

1 个答案:

答案 0 :(得分:0)

这是一个可以做的简单例子。但是,最好对城市使用模糊匹配。

# City codes (all city codes can be found at https://www.allareacodes.com/)
my_city_codes <- data.frame(code = c(201:206), 
                            cities = c("Jersey City, NJ", "District of Columbia", "Bridgeport, CT", "Manitoba", "Birmingham, AL", "Seattle, WA"),
                            stringsAsFactors = FALSE)

# Function for checking if city/city-code matches those in the registries
adress_checker <- function(adress, citycodes) {
  # Finding real city
  real_city <- my_city_codes$cities[which(adress$code == my_city_codes$code)]

  # Checking if cities are the same
  if(real_city == adress$city) {
    return("Correct city")
  } else {
    return("Incorrect city")
  }
}

# Adresses to check
right_city <- data.frame(code = 205, city = c("Birmingham, AL"), stringsAsFactors = FALSE)
wrong_city <- data.frame(code = 205, city = c("Las Vegas"), stringsAsFactors = FALSE)

# Testing function
adress_checker(right_city, my_city_codes)
[1] "Correct city"
adress_checker(wrong_city, my_city_codes)
[1] "Incorrect city"