如何根据邮政编码确定区域

时间:2019-06-28 09:28:56

标签: r dataframe data-manipulation

我有2个数据框,一个包含带有邮政编码的数据,另一个包含带有一组邮政编码的区域enter image description here

我想根据邮政编码在数据框1中添加“地区”列,该怎么办? (请注意:数据框2中的区域可以包含多个邮政编码。

感谢帮助。

1 个答案:

答案 0 :(得分:0)

这可以用dplyr和tidyr解决。我敢肯定还有其他解决方案。

# create the data
df1 <- data.frame(pcodes = c(1001, 1002, 1003))
df2 <- data.frame(regions = c(1, 2), 
                  pcodes = c("1001, 1002, 1003", "1004, 1005"),
                  stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)

# separate postcodes column and reshape long
# (from https://stackoverflow.com/a/33288868/2633645)
df2 <- df2 %>% 
  mutate(to = strsplit(pcodes, split = ",")) %>% 
  unnest(to) %>% 
  mutate(to = as.numeric(to)) %>% 
  select(-pcodes) %>% 
  rename(pcodes = to) # rename `to` to `pcodes` for join purpose

# join the data sets by the common variable pcodes
df_both <- left_join(df1, df2)
df_both

  pcodes regions
1   1001       1
2   1002       1
3   1003       1