连接两个数据帧以将完整的状态名称转换为R和dplyr中的状态缩写

时间:2018-02-10 04:32:53

标签: r dplyr

我的第一个数据框包含一个名为 state 的状态列,但有些条目显示为缩写(LA,CA,OH),而其他条目则显示状态的全名(路易斯安那州,加利福尼亚州,俄亥俄州)。

我的第二个数据框包括四列,其中包含以下标题:

  1. allCaps(例如:ALABAMA)
  2. 完整(例如:阿拉巴马州)
  3. twoLetter(例如:AL)
  4. threeLetter(例如:Ala。)
  5. 有没有办法连接两个数据框,以便第一个数据框只显示状态列中的状态缩写,用缩写替换全名?

    编辑:

    我要包括图片,尽管之前已被拍过来。

    这是表一。每行都是从各州发送的单独推文。我用这段代码创建了它(从一个名为tweets的单独表中绘制数据):

    tweets_per_state <- tweets %>%
      filter(country_code == "US" & place_type == "city" | place_type == "admin") %>%
      select(place_type, full_name) %>%
      mutate(state = ifelse(place_type == "admin", str_sub(full_name, start = 1, end = -6), str_sub(full_name, -2)))
    

    enter image description here

    这是表二,我试图加入表一,以便表一显示“弗吉尼亚”,而不是显示“VA”。

    enter image description here

1 个答案:

答案 0 :(得分:1)

一个基于dplyr的解决方案将涉及使用dummy列加入两个表,然后使用greplstate列替换为twoLetter值。

我创建了几行的data.frames来演示解决方案。

tweets <- data.frame(place_type = rep("city",4),
full_name = c("Los Angeles, CA", "Maitland, FL", "Indianapolis, IN", "Virginia, USA" ),
        state = c("CA", "FL", "IN", "Virginia"), stringsAsFactors = F)

      #  place_type        full_name    state
      #1       city  Los Angeles, CA       CA
      #2       city     Maitland, FL       FL
      #3       city Indianapolis, IN       IN
      #4       city    Virginia, USA Virginia

 state <- data.frame(allCaps = c("CALIFORNIA", "FLORIDA", "INDIANA", "VIRGINIA"),
              full = c("California", "Florida", "Indiana", "Virginia"),
              twoLetter = c("CA", "FL", "IN", "VR"),
              threeLetter = c("Calif.", "Fla.", "Ind.", "Vir." ),stringsAsFactors = F)

state <- state %>% mutate(dummy = 1)

tweets%>%
  mutate(dummy = 1) %>%
  filter(place_type == "city" | place_type == "admin") %>%
  inner_join(state, by = "dummy") %>%
  rowwise() %>%
  mutate(state = ifelse(state == twoLetter , state,
                        ifelse(grepl(full, full_name),twoLetter, NA))) %>%
  filter(!is.na(state)) %>%
  select(place_type,full_name,state)

# Result
#  place_type full_name        state
#  <chr>      <chr>            <chr>
# 1 city       Los Angeles, CA  CA   
# 2 city       Maitland, FL     FL   
# 3 city       Indianapolis, IN IN   
# 4 city       Virginia, USA    VR