R:按邻居对过滤数据帧?

时间:2021-05-05 11:55:40

标签: r filter dplyr

我有一个邻居的数据框,其中一个单元格(比方说像素)a 邻居单元格 b, c, d 等。这就像一个移动窗口,所以我有一个 central_id 和然后 neighbors 其中每个 central 都有唯一的邻居。然后我有一个数据框,其中包含特定时间该单元格的值。我需要比较每个中心像元与其相邻像元之间的值有何不同,以及它随时间的变化如何?

这是一个例子:

set.seed(3)
nbrs <- data.frame(central_id = c("a", "a", "a",
                                  "b", "b", "b", 
                                  "c", "c", "d", 
                                  "e"),
                   nbrs_id    = c("b", "c", "d",
                                  "a", "c", "e",
                                  "a", "b", "e", "d"))


# Generate data with values
df <- data.frame(year = rep(c(1, 2, 3), each = 5),
                 id = c("a", "b", "c", "d", "e"),
                 vals = 10+ rnorm(15))

我想要的数据框看起来像这样,明确邻居是什么:

  year central_id central_val nbrs_id nbrs_val
1    1          a   10.074955       b 8.354045
2    1          a   10.074955       c 11.774009
3    1          a   10.074955       d 10.765968
4    1 ...............

如何先高效过滤值数据集,通过id获取值,然后拼凑成表?我有大约 1000 万行,所以我正在寻找有效的东西。到现在为止,我只使用了一些简单的过滤来获取特定的值,例如 df%>% filter(year == 1 & id == 'a') 来获取我的 vals 但这需要很长时间。我确定有更有效的方法吗?

1 个答案:

答案 0 :(得分:1)

你想要这个吗?

set.seed(3)
nbrs <- data.frame(central_id = c("a", "a", "a",
                                  "b", "b", "b", 
                                  "c", "c", "d", 
                                  "e"),
                   nbrs_id    = c("b", "c", "d",
                                  "a", "c", "e",
                                  "a", "b", "e", "d"))


# Generate data with values
df <- data.frame(year = rep(c(1, 2, 3), each = 5),
                 id = c("a", "b", "c", "d", "e"),
                 vals = 10+ rnorm(15))
library(dplyr)

df %>% left_join(nbrs, by = c('id' = 'central_id')) %>%
  left_join(df, by = c('year' = 'year', 'nbrs_id' = 'id'),
            suffix = c('', '_nbrs'))
#>    year id      vals nbrs_id vals_nbrs
#> 1     1  a  9.038067       b  9.707474
#> 2     1  a  9.038067       c 10.258788
#> 3     1  a  9.038067       d  8.847868
#> 4     1  b  9.707474       a  9.038067
#> 5     1  b  9.707474       c 10.258788
#> 6     1  b  9.707474       e 10.195783
#> 7     1  c 10.258788       a  9.038067
#> 8     1  c 10.258788       b  9.707474
#> 9     1  d  8.847868       e 10.195783
#> 10    1  e 10.195783       d  8.847868
#> 11    2  a 10.030124       b 10.085418
#> 12    2  a 10.030124       c 11.116610
#> 13    2  a 10.030124       d  8.781143
#> 14    2  b 10.085418       a 10.030124
#> 15    2  b 10.085418       c 11.116610
#> 16    2  b 10.085418       e 11.267369
#> 17    2  c 11.116610       a 10.030124
#> 18    2  c 11.116610       b 10.085418
#> 19    2  d  8.781143       e 11.267369
#> 20    2  e 11.267369       d  8.781143
#> 21    3  a  9.255218       b  8.868781
#> 22    3  a  9.255218       c  9.283642
#> 23    3  a  9.255218       d 10.252652
#> 24    3  b  8.868781       a  9.255218
#> 25    3  b  8.868781       c  9.283642
#> 26    3  b  8.868781       e 10.152046
#> 27    3  c  9.283642       a  9.255218
#> 28    3  c  9.283642       b  8.868781
#> 29    3  d 10.252652       e 10.152046
#> 30    3  e 10.152046       d 10.252652

JSON.parse (v2.0.0) 于 2021 年 5 月 5 日创建

相关问题