Question

我是Python的新手，我正在尝试将我在两个独立程序中创建的功能结合起来。

目标是按各种描述对值进行分组，然后按日期对数据集的平均值进行分组。我已经使用Pandas Groupby成功完成了这项工作。

我想评估的一个描述是在数据集中每个点的给定距离内进行平均。到目前为止，我一直使用邮政编码作为位置描述。另外，我已经能够使用Geopy来确定数据集中使用GPS点在所需距离内的所有其他点。这为我提供了所需距离内数据集中每个ID的ID列表。

以下是一个示例数据集：

ID  Date    Value   Color  Location
1    1      1234    Red    60941
1    2      51461   Red    60941
1    3      6512    Red    60941
1    4      5123    Red    60941
1    5      48413   Red    60941
2    1      5416    Blue   60941
2    2      32      Blue   60941
2    3      18941   Blue   60941
2    4      5135    Blue   60941
2    5      1238    Blue   60941
3    1      651651  Blue   60450
3    2      1777    Blue   60450
3    3      1651    Blue   60450
3    4      1968    Blue   60450
3    5      846     Blue   60450
4    1      1689    Red    60941
4    2      1651    Red    60941
4    3      184     Red    60941
4    4      19813   Red    60941
4    5      132     Red    60941
5    1      354     Yellow 60450
5    2      684     Yellow 60450
5    3      489     Yellow 60450
5    4      354     Yellow 60450
5    5      846     Yellow 60450

这是我目前使用邮政编码位置描述工作的Pandas代码：

average_df = data_df['Value'].groupby([data_df['Location'],data_df['Color'],data_df['Date']]).mean()

有没有办法将从Geopy获得的列表传递给Groupby代替我目前拥有的[＆＃39; Location＆＃39;]组？例如，Groupby List（ID）[List 1：（1,2,3），List 2：（3,1,5），List 3：（2,3,4）] then color and date。

我已经浏览了Pandas文档并搜索了这个网站，并且没有发现任何人在Pandas Groupby中使用列表，所以我不确定它是否可行。也许我需要在一个numpy数组中这样做？任何反馈都将不胜感激。

Answer 1

Pandas很容易通过布尔列表进行分组。因此，您需要做的就是获取每行是否在附近的列表。最简单的方法是使用列表解析：

df = pandas.DataFrame({'value': [3,2,3,6,4,1], 'location': ['a', 'a', 'b', 'c', 'c', 'c']})
nearby_locations = ['a','b']
is_nearby = [(loc in nearby_locations) for loc in df['location']]  
# is_nearby = [True, True, True, False, False, False]
df.groupby(is_nearby).mean()

这将输出：

          value
False  3.666667
True   2.666667

Python Pandas Groupby列表列表

1 个答案: