我的第一个数据框包含一个名为 state 的状态列,但有些条目显示为缩写(LA,CA,OH),而其他条目则显示状态的全名(路易斯安那州,加利福尼亚州,俄亥俄州)。
我的第二个数据框包括四列,其中包含以下标题:
有没有办法连接两个数据框,以便第一个数据框只显示状态列中的状态缩写,用缩写替换全名?
编辑:
我要包括图片,尽管之前已被拍过来。
这是表一。每行都是从各州发送的单独推文。我用这段代码创建了它(从一个名为tweets的单独表中绘制数据):
tweets_per_state <- tweets %>%
filter(country_code == "US" & place_type == "city" | place_type == "admin") %>%
select(place_type, full_name) %>%
mutate(state = ifelse(place_type == "admin", str_sub(full_name, start = 1, end = -6), str_sub(full_name, -2)))
这是表二,我试图加入表一,以便表一显示“弗吉尼亚”,而不是显示“VA”。
答案 0 :(得分:1)
一个基于dplyr
的解决方案将涉及使用dummy
列加入两个表,然后使用grepl
将state
列替换为twoLetter
值。
我创建了几行的data.frames来演示解决方案。
tweets <- data.frame(place_type = rep("city",4),
full_name = c("Los Angeles, CA", "Maitland, FL", "Indianapolis, IN", "Virginia, USA" ),
state = c("CA", "FL", "IN", "Virginia"), stringsAsFactors = F)
# place_type full_name state
#1 city Los Angeles, CA CA
#2 city Maitland, FL FL
#3 city Indianapolis, IN IN
#4 city Virginia, USA Virginia
state <- data.frame(allCaps = c("CALIFORNIA", "FLORIDA", "INDIANA", "VIRGINIA"),
full = c("California", "Florida", "Indiana", "Virginia"),
twoLetter = c("CA", "FL", "IN", "VR"),
threeLetter = c("Calif.", "Fla.", "Ind.", "Vir." ),stringsAsFactors = F)
state <- state %>% mutate(dummy = 1)
tweets%>%
mutate(dummy = 1) %>%
filter(place_type == "city" | place_type == "admin") %>%
inner_join(state, by = "dummy") %>%
rowwise() %>%
mutate(state = ifelse(state == twoLetter , state,
ifelse(grepl(full, full_name),twoLetter, NA))) %>%
filter(!is.na(state)) %>%
select(place_type,full_name,state)
# Result
# place_type full_name state
# <chr> <chr> <chr>
# 1 city Los Angeles, CA CA
# 2 city Maitland, FL FL
# 3 city Indianapolis, IN IN
# 4 city Virginia, USA VR