根据列明智条件插入并填充两列

时间:2019-12-28 14:07:31

标签: r

我有这个df:

data <- structure(list(location = c("bern", "bern", "zurich", "zurich", 
                                "basel", "basel", "basel"), location_latitude = c(4.1, 4.1, 6.2, 
                                                                                  6.2, 7.3, 7.3, 7.3), location_longitude = c(2.1, 2.1, 3.2, 3.2, 
                                                                                                                              5.6, 5.6, 5.6), location_population = c(38, 38, 72, 72, 46, 46, 
                                                                                                                                                                      46), origin = c("zurich", "basel", "bern", "basel", "bern", "zurich", 
                                                                                                                                                                                      "locarno"), origin_temperature = c(12, 20, 21, 20, 21, 12, 27
                                                                                                                                                                                      )), row.names = c(NA, 7L), class = "data.frame")

我具有位置的纬度和经度,但没有起源的纬度和经度。

我想插入两列,并根据列位置的相应坐标为来源填充纬度和经度,如下所示:

data_needed <- structure(list(location = c("bern", "bern", "zurich", "zurich", 
                                       "basel", "basel", "basel"), location_latitude = c(4.1, 4.1, 6.2, 
                                                                                         6.2, 7.3, 7.3, 7.3), location_longitude = c(2.1, 2.1, 3.2, 3.2, 
                                                                                                                                     5.6, 5.6, 5.6), location_population = c(38, 38, 72, 72, 46, 46, 
                                                                                                                                                                             46), origin = c("zurich", "basel", "bern", "basel", "bern", "zurich", 
                                                                                                                                                                                             "locarno"), origin_latitude = c("6.2", "7.3", "4.1", 
                                                                                                                                                                                                                             "7.3", "4.1", "6.2", "NA"), origin_longitude = c("3.2", 
                                                                                                                                                                                                                                                                                             "5.6", "2.1", "5.6", "2.1", "3.2", "NA"), origin_temperature = c(12, 
                                                                                                                                                                                                                                                                                                                                                              20, 21, 20, 21, 12, 27)), row.names = c(NA, 7L), class = "data.frame")

我认为这需要按列进行,但我不知道该怎么做。

我也不想添加指定位置的条件(例如,如果为“ zurich”,因为数据集具有数千个位置和原点)。我需要“自动”完成此操作。

还请注意,在位置(例如洛迦诺)中没有匹配坐标的原点应返回NA。

请帮助!

2 个答案:

答案 0 :(得分:3)

使用基数R:

data <- within(data, origin_latitude <- location_latitude[match(origin, location)])
data <- within(data, origin_longitude<- location_longitude[match(origin, location)])

使用data.table

setDT(data)
data[, 
     c("origin_latitude", "origin_longitude") := .SD[match(origin, location)], 
     .SDcols = c("location_latitude", "location_longitude")]

答案 1 :(得分:2)

这是使用dplyr

的一种方法
library(dplyr)

data %>%
    select(origin = "location", origin_latitude = "location_latitude", origin_longitude = "location_longitude") %>%
    distinct() %>%
    left_join(data, ., by = "origin") %>%
    select(-origin_temperature, origin_temperature)

  location location_latitude location_longitude location_population  origin origin_latitude origin_longitude origin_temperature
1     bern               4.1                2.1                  38  zurich             6.2              3.2                 12
2     bern               4.1                2.1                  38   basel             7.3              5.6                 20
3   zurich               6.2                3.2                  72    bern             4.1              2.1                 21
4   zurich               6.2                3.2                  72   basel             7.3              5.6                 20
5    basel               7.3                5.6                  46    bern             4.1              2.1                 21
6    basel               7.3                5.6                  46  zurich             6.2              3.2                 12
7    basel               7.3                5.6                  46 locarno              NA               NA                 27