我有以下示例数据:
df <- tibble(
"City1" = c("New York", "Boston", "Chicago"),
"City2" = c("Chicago", "Cleveland", "Atlanta"))
假设City1
是起点,City2
是终点。即,一个人从纽约旅行到芝加哥。
我想添加一列用于起始纬度和一列用于起始经度,并且也要对目标城市进行相同的操作。总之,我想要四列。我已经有了坐标。
如何分配坐标?我尝试使用case_when
,但不确定如何将坐标传递到多列。一栏很容易:
library(tidyverse)
# The numbers after the cities are the latitudes
df <- df %>%
mutate(
City1_lat = case_when(
City1 == 'New York' ~ 40.7128,
City1 == 'Boston' ~ 42.3601,
City1 == 'Chicago' ~ 41.8781
)
)
如何扩展它以添加到City1_lon
列中?由于我有数千行的原点/终点,因此尝试尽可能地简化此过程。 dplyr
或base
解决方案均有效。我将其扩展到目标城市City2
。供参考:
New York: 40.7128, 74.0060
Boston: 42.3601, 71.0589
Chicago: 41.8781, 87.6298
Cleveland: 41.4993, 81.6944
Atlanta: 33.7490, 84.3880
答案 0 :(得分:2)
将您的城市数据放在这样的数据框中:
> city
City lat long
1 New York 40.7128 74.0060
2 Boston 42.3601 71.0589
3 Chicago 41.8781 87.6298
4 Cleveland 41.4993 81.6944
5 Atlanta 33.7490 84.3880
使用match
在表格中查找城市名称,提取经纬度,然后重命名即可:
> setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long"))
City1lat City1long
1 40.7128 74.0060
2 42.3601 71.0589
3 41.8781 87.6298
> setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long"))
City2lat City2long
3 41.8781 87.6298
4 41.4993 81.6944
5 33.7490 84.3880
您可以cbind
将其保存到原始数据上:
> df = cbind(df, setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long")), setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long")))
> df
City1 City2 City1lat City1long City2lat City2long
1 New York Chicago 40.7128 74.0060 41.8781 87.6298
2 Boston Cleveland 42.3601 71.0589 41.4993 81.6944
3 Chicago Atlanta 41.8781 87.6298 33.7490 84.3880
答案 1 :(得分:2)
一种选择是在创建“键值”数据集后执行left_join
library(tidyverse)
map_dfc(names(df), ~ df %>%
select(.x) %>%
left_join(keyval, by = setNames('City', .x))) %>%
select(names(df), everything())
# A tibble: 3 x 6
# City1 City2 lat lon lat1 lon1
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 New York Chicago 40.7 74.0 41.9 87.6
#2 Boston Cleveland 42.4 71.1 41.5 81.7
#3 Chicago Atlanta 41.9 87.6 33.7 84.4
如果原始数据中有更多列,而我们仅对“城市”列感兴趣,则仅循环遍历“城市”列
df$journeys <- (100,200,300)
nm1 <- grep("City", names(df), value = TRUE)
map_dfc(nm1, ~ df %>%
select(.x) %>%
left_join(keyval, by = setNames('City', .x))) %>%
bind_cols(df %>%
select(-one_of(nm1)))
keyval <- structure(list(City = c("New York", "Boston", "Chicago", "Cleveland",
"Atlanta"), lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.749
), lon = c(74.0068, 71.0589, 87.6298, 81.6944, 84.388)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
答案 2 :(得分:1)
这是一个整洁的解决方案:
library(dplyr)
library(purrr)
df <- tibble(
"City1" = c("New York", "Boston", "Chicago"),
"City2" = c("Chicago", "Cleveland", "Atlanta"))
df <- df %>%
mutate(
City1_coords = case_when(
City1 == 'New York' ~ list(c(40.7128,74.0060)),
City1 == 'Boston' ~ list(c(42.3601,71.0589)),
City1 == 'Chicago' ~ list(c(41.8781,87.6298))
)
) %>%
mutate(City1_lat = City1_coords %>% map_dbl(~ .x[1] ),
City1_lon = City1_coords %>% map_dbl(~ .x[2] ))
答案 3 :(得分:1)
这是一种使用mutate_all
和unnest
的方法,还有一个用于命名列的额外技巧:
df %>%
mutate_all(funs(l = case_when(
. == 'New York' ~ list(tibble(at=40.7128, on=74.0060)),
. == 'Boston' ~ list(tibble(at=42.3601, on=71.0589)),
. == 'Chicago' ~ list(tibble(at=41.8781, on=87.6298)),
. == 'Cleveland' ~ list(tibble(at=41.4993, on=81.6944)),
. == 'Atlanta' ~ list(tibble(at=33.7490, on=84.3880))
)
)) %>%
unnest(.sep = "")
# # A tibble: 3 x 6
# City1 City2 City1_lat City1_lon City2_lat City2_lon
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 New York Chicago 40.7128 74.0060 41.8781 87.6298
# 2 Boston Cleveland 42.3601 71.0589 41.4993 81.6944
# 3 Chicago Atlanta 41.8781 87.6298 33.7490 84.3880
这解决了“使用case_when()来分配两个新列” 。
为解决一般性问题,我建议基于左联接的解决方案,因为将键和值放在整齐的单独表中更为灵活。
答案 4 :(得分:0)
您应该从外部调用带有“ city,lat和long”信息的文件(在我的示例中称为data_xy),然后可以使用left_join。尝试以下代码:
library(dplyr)
library(purrr)
data_xy <- tibble(city = c("New York", "Boston", "Chicago", "Cleveland", "Atlanta"),
lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.7490),
lon = c(74.0060, 71.0589, 87.6298, 81.6944, 84.3880))
df <- tibble("City1" = c("New York", "Boston", "Chicago"),
"City2" = c("Chicago", "Cleveland", "Atlanta"))
df_latlon <- map(names(df), ~ left_join(df %>% select(.x), data_xy,
by= structure(names = .x, .Data = "city")) )
df_latlon
输出:
> df_latlon
[[1]]
# A tibble: 3 x 3
City1 lat lon
<chr> <dbl> <dbl>
1 New York 40.7 74.0
2 Boston 42.4 71.1
3 Chicago 41.9 87.6
[[2]]
# A tibble: 3 x 3
City2 lat lon
<chr> <dbl> <dbl>
1 Chicago 41.9 87.6
2 Cleveland 41.5 81.7
3 Atlanta 33.7 84.4