Question

所以我从CSV文件创建了一个Pandas数据帧。 csv文件来自http://stat-computing.org/dataexpo/2009/the-data.html。具体来说，我有兴趣找到一个月最受欢迎的机场，并且我想收集和操纵每个机场的Origin，DayofMonth，UniqueCarrier，Taxi time = TaxiOut + TaxiIn（目的地），WeatherDelay，Dest。我正在尝试输出一个csv，其中包含最受欢迎机场的信息。

为此，我正在创建字典（在Jupyter Notebook的Python中，使用Apache Spark）。在我的代码下面，我正在尝试收集每个Origin（机场名称）并创建一个字典，其中包含每天延迟的数组，该机场的一系列运营商，出租车时间长度，总计飞机滑行（离开）以及抵达时，每天都有一系列天气延误，以及该机场的一系列目的地。

for x in range(len(df2.index)):
    if df2["Origin"] not in locals():
        df2["Origin"] = {'Days': [0]*31, 'Carriers': [], 'TaxiSum': 0, 'TaxiNum': 0, 'Weather': []*31, 'Dest': []}

我收到以下错误：

TypeError                                 Traceback (most recent call last 
<ipython-input-93-8b0de59b6cd2> in <module>()
1 for x in range(len(df2.index)):
----> 2     if df2["Origin"] not in locals():
3         df2["Origin"] = {'Days': [0]*31, 'Carriers': [], 'TaxiSum': 0, 'TaxiNum': 0, 'Weather': []*31, 'Dest': []}

~/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __hash__(self)
    875     def __hash__(self):
    876         raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 877                         ' hashed'.format(self.__class__.__name__))
    878 
    879     def __iter__(self):

TypeError: 'Series' objects are mutable, thus they cannot be hashed

所以我想创建一个具有相同名称的变量（在本例中为“Origin”机场），但它似乎解释了我正在做的事情。

Answer 1

您似乎正在尝试循环每一行，但在每次迭代中，您都试图访问整个列Origin。

您可以尝试以下方法查看它是否有效吗？

for x in range(len(df2.index)):
    if df2.iloc[x]["Origin"] not in locals():
        df2.iloc[x]["Origin"] = {'Days': [0]*31, 'Carriers': [], 'TaxiSum': 0, 'TaxiNum': 0, 'Weather': []*31, 'Dest': []}

试图使用一个系列作为dict的变量名

我收到以下错误：

1 个答案: