Question

我是R菜鸟，希望你们中的一些人可以帮助我。

我有两个数据集： - store（包含商店数据，包括位置坐标（x，y）。该位置是整数值，对应于GridIds） - grid（包含所有gridIDs（x，y）以及每个网格点的填充变量TOT_P）

我想要达到的目的是：对于每个商店，我想循环遍历网格日期，并将网格ID的总数接近商店网格ID。

即，基本上是SUMIF网格填充变量，条件是

grid(x) < store(x) + 1 & 
grid(x) > store(x) - 1 & 
grid(y) < store(y) + 1 &
grid(y) > store(y) - 1

我怎样才能做到这一点？我自己一直试图使用不同的东西，比如合并，讽刺等等，但是我缺乏经验会阻止我做正确的事。

提前致谢！

编辑：样本数据：

StoreName   StoreX  StoreY
Store1  3   6
Store2  5   2

TOT_P   GridX   GridY
8   1   1
7   2   1
3   3   1
3   4   1
22  5   1
20  6   1
9   7   1
28  1   2
8   2   2
3   3   2
12  4   2
12  5   2
15  6   2
7   7   2
3   1   3
3   2   3
3   3   3
4   4   3
13  5   3
18  6   3
3   7   3
61  1   4
25  2   4
5   3   4
20  4   4
23  5   4
72  6   4
14  7   4
178 1   5
407 2   5
26  3   5
167 4   5
58  5   5
113 6   5
73  7   5
76  1   6
3   2   6
3   3   6
3   4   6
4   5   6
13  6   6
18  7   6
3   1   7
61  2   7
25  3   7
26  4   7
167 5   7
58  6   7
113 7   7

我正在寻找的输出是

StoreName   StoreX  StoreY  SUM_P
Store1  3   6   479
Store2  5   2   119

即对于store1，它是网格字段X = [2-4]和Y = [5-7]

的TOT_P之和

Answer 1

一种方法是使用dplyr计算每个商店和所有网格点之间的差异，然后根据这些新列进行分组和求和。

#import library
library(dplyr)

#create example store table
StoreName<-paste0("Store",1:2)
StoreX<-c(3,5)
StoreY<-c(6,2)
df.store<-data.frame(StoreName,StoreX,StoreY)

#create example population data (copied example table from OP)
df.pop

#add dummy column to each table to enable cross join
df.store$k=1
df.pop$k=1

#dplyr to join, calculate absolute distance, filter and sum    
df.store %>%
  inner_join(df.pop, by='k') %>%
  mutate(x.diff = abs(StoreX-GridX), y.diff=abs(StoreY-GridY)) %>%    
  filter(x.diff<=1, y.diff<=1) %>%
  group_by(StoreName) %>%
  summarise(StoreX=max(StoreX), StoreY=max(StoreY), tot.pop = sum(TOT_P) ) 

#output:
  StoreName StoreX StoreY tot.pop
     <fctr>  <dbl>  <dbl>   <int>
1    Store1      3      6     721
2    Store2      5      2     119

R：表2中的Sum列基于表1中的值，并将结果存储在表1中

1 个答案: