使用case_when()分配两个新列,而不是一个

时间:2019-01-28 17:38:45

标签: r dplyr

我有以下示例数据:

df <- tibble(
  "City1" = c("New York", "Boston", "Chicago"),
  "City2" = c("Chicago", "Cleveland", "Atlanta"))

假设City1是起点,City2是终点。即,一个人从纽约旅行到芝加哥。

我想添加一列用于起始纬度和一列用于起始经度,并且也要对目标城市进行相同的操作。总之,我想要四列。我已经有了坐标。

如何分配坐标?我尝试使用case_when,但不确定如何将坐标传递到多列。一栏很容易:

library(tidyverse)

# The numbers after the cities are the latitudes
df <- df %>% 
  mutate(
   City1_lat = case_when(
    City1 == 'New York' ~ 40.7128,
    City1 == 'Boston' ~ 42.3601,
    City1 == 'Chicago' ~ 41.8781
  )
 )

如何扩展它以添加到City1_lon列中?由于我有数千行的原点/终点,因此尝试尽可能地简化此过程。 dplyrbase解决方案均有效。我将其扩展到目标城市City2。供参考:

New York: 40.7128, 74.0060
Boston: 42.3601, 71.0589
Chicago: 41.8781, 87.6298
Cleveland: 41.4993, 81.6944
Atlanta: 33.7490, 84.3880

5 个答案:

答案 0 :(得分:2)

将您的城市数据放在这样的数据框中:

> city
       City     lat    long
1  New York 40.7128 74.0060
2    Boston 42.3601 71.0589
3   Chicago 41.8781 87.6298
4 Cleveland 41.4993 81.6944
5   Atlanta 33.7490 84.3880

使用match在表格中查找城市名称,提取经纬度,然后重命名即可:

> setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long"))
  City1lat City1long
1  40.7128   74.0060
2  42.3601   71.0589
3  41.8781   87.6298

> setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long"))
  City2lat City2long
3  41.8781   87.6298
4  41.4993   81.6944
5  33.7490   84.3880

您可以cbind将其保存到原始数据上:

> df = cbind(df, setNames(city[match(df$City1, city$City), c("lat","long")],c("City1lat","City1long")), setNames(city[match(df$City2, city$City), c("lat","long")],c("City2lat","City2long")))
> df
     City1     City2 City1lat City1long City2lat City2long
1 New York   Chicago  40.7128   74.0060  41.8781   87.6298
2   Boston Cleveland  42.3601   71.0589  41.4993   81.6944
3  Chicago   Atlanta  41.8781   87.6298  33.7490   84.3880

答案 1 :(得分:2)

一种选择是在创建“键值”数据集后执行left_join

library(tidyverse)
map_dfc(names(df), ~  df %>% 
                        select(.x) %>% 
                        left_join(keyval, by = setNames('City', .x))) %>%
    select(names(df), everything())  
# A tibble: 3 x 6
#  City1    City2       lat   lon  lat1  lon1
#  <chr>    <chr>     <dbl> <dbl> <dbl> <dbl>
#1 New York Chicago    40.7  74.0  41.9  87.6
#2 Boston   Cleveland  42.4  71.1  41.5  81.7
#3 Chicago  Atlanta    41.9  87.6  33.7  84.4

如果原始数据中有更多列,而我们仅对“城市”列感兴趣,则仅循环遍历“城市”列

df$journeys <- (100,200,300)
nm1 <- grep("City", names(df), value = TRUE)
map_dfc(nm1, ~  df %>% 
                     select(.x) %>% 
                     left_join(keyval, by = setNames('City', .x))) %>%  
      bind_cols(df %>% 
                  select(-one_of(nm1)))

数据

keyval <- structure(list(City = c("New York", "Boston", "Chicago", "Cleveland", 
 "Atlanta"), lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.749
 ), lon = c(74.0068, 71.0589, 87.6298, 81.6944, 84.388)), row.names = c(NA, 
  -5L), class = c("tbl_df", "tbl", "data.frame"))

答案 2 :(得分:1)

这是一个整洁的解决方案:

library(dplyr)
library(purrr)

df <- tibble(
  "City1" = c("New York", "Boston", "Chicago"),
  "City2" = c("Chicago", "Cleveland", "Atlanta"))


df <- df %>% 
  mutate(
    City1_coords = case_when(
      City1 == 'New York' ~ list(c(40.7128,74.0060)),
      City1 == 'Boston' ~ list(c(42.3601,71.0589)),
      City1 == 'Chicago' ~ list(c(41.8781,87.6298))
    )
  ) %>% 
  mutate(City1_lat = City1_coords %>% map_dbl(~ .x[1] ),
         City1_lon = City1_coords %>% map_dbl(~ .x[2] ))

答案 3 :(得分:1)

这是一种使用mutate_allunnest的方法,还有一个用于命名列的额外技巧:

df %>% 
  mutate_all(funs(l = case_when(
      . == 'New York'  ~ list(tibble(at=40.7128, on=74.0060)),
      . == 'Boston'    ~ list(tibble(at=42.3601, on=71.0589)),
      . == 'Chicago'   ~ list(tibble(at=41.8781, on=87.6298)),
      . == 'Cleveland' ~ list(tibble(at=41.4993, on=81.6944)),
      . == 'Atlanta'   ~ list(tibble(at=33.7490, on=84.3880))
    )
  )) %>%
  unnest(.sep = "")

# # A tibble: 3 x 6
#      City1     City2 City1_lat City1_lon City2_lat City2_lon
#      <chr>     <chr>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 New York   Chicago   40.7128   74.0060   41.8781   87.6298
# 2   Boston Cleveland   42.3601   71.0589   41.4993   81.6944
# 3  Chicago   Atlanta   41.8781   87.6298   33.7490   84.3880

这解决了“使用case_when()来分配两个新列”

为解决一般性问题,我建议基于左联接的解决方案,因为将键和值放在整齐的单独表中更为灵活。

答案 4 :(得分:0)

您应该从外部调用带有“ city,lat和long”信息的文件(在我的示例中称为data_xy),然后可以使用left_join。尝试以下代码:

library(dplyr)
library(purrr)
data_xy <- tibble(city = c("New York", "Boston", "Chicago", "Cleveland", "Atlanta"),
                  lat = c(40.7128, 42.3601, 41.8781, 41.4993, 33.7490),
                  lon = c(74.0060, 71.0589, 87.6298, 81.6944, 84.3880))


df <- tibble("City1" = c("New York", "Boston", "Chicago"),
             "City2" = c("Chicago", "Cleveland", "Atlanta"))

df_latlon <- map(names(df), ~ left_join(df %>% select(.x),  data_xy, 
                                        by= structure(names = .x, .Data = "city")) )
df_latlon

输出:

> df_latlon
[[1]]
# A tibble: 3 x 3
  City1      lat   lon
  <chr>    <dbl> <dbl>
1 New York  40.7  74.0
2 Boston    42.4  71.1
3 Chicago   41.9  87.6

[[2]]
# A tibble: 3 x 3
  City2       lat   lon
  <chr>     <dbl> <dbl>
1 Chicago    41.9  87.6
2 Cleveland  41.5  81.7
3 Atlanta    33.7  84.4