Question

对于每个观测值，我都有df1和纬度和经度数据，而df2包含纬度和经度桶，这些桶一起捕获了每种可能的组合。 df2还包括一列，其中包含每个存储区的唯一“存储区ID”。我想在df1中创建一列，为每个观察值填充正确的存储区ID。

df1

      | Latitude | Longitude
0     | 36.9003  | 98.2183
1     | 33.1701  | 98.2988
...   | ...      | ...
2999  | 39.8944  | 98.2018
3000  | 34.9582  | 100.0900

df2

      | Lat_Start | Lat_End | Long_Start | Long_End | Bucket_ID
0     | 33.10     | 33.15   | 98.20      | 98.25    | 0
1     | 33.16     | 33.20   | 98.26      | 98.30    | 1
...   | ...       | ...     | ...        | ...      | ...
76699 | 39.96     | 40.00   | 100.01     | 100.05   | 76699
76700 | 40.01     | 40.05   | 100.06     | 100.10   | 76700

预期输出 df1

      | Latitude | Longitude | Bucket_ID
0     | 36.9003  | 98.2183   | 34053
1     | 33.1701  | 98.2988   | 1
...   | ...      | ...       | ...
2999  | 39.8944  | 98.2018   | 65382
3000  | 34.9582  | 100.0900  | 3244

Answer 1

由于您的数据集似乎不是很大，因此这段简单（但效率低下）的代码可以帮助您解决问题：

def find_bucket(latitude, longitude):
    for i in df2.index:
        if df2['Lat_Start'].loc[i]<=latitude and df2['Lat_End'].loc[i]>=latitude and df2['Long_Start'].loc[i]<=longitude and df2['Long_End'].loc[i]>=longitude:
            return df2['Bucket_ID'].loc[i]
    return -1

df1['Bucket_ID'] = df1.apply(lambda x: find_bucket(x.loc['Latitude'], x.loc['Longitude']), axis = 1)

如果满足条件，则分配标识符

1 个答案: