熊猫数据框“ ValueError:无法从重复轴重新索引”重复索引蛮力解决方案?

时间:2019-06-15 04:57:07

标签: python pandas valueerror

import pandas as pd

df_avocado = pd.read_csv("avocado.csv")
df_avocado.set_index("Date", inplace=True)

问题在这里:

'''
determines all unique regions (ex: "Alabama", "Alaska", "Arkansas") in dataframe "df_avocado"
finds all data-points belonging to that unique region
dumps those data-points into a temporary dataframe "df_region"
calculates the 25sma of every df_region
dumps the 25sma to "df_avocado_region_25ma" so I can compare 25sma of every region
'''

df_avocado_region_25ma = pd.DataFrame()
for region in df_avocado["region"].unique():
    df_region = df_avocado.copy()[df_avocado["region"] == region]
    df_avocado_region_25ma[f"{region}_25ma"] = df_region["AveragePrice"].rolling(25).mean()

当将每个df_region添加到df_avocado_region_25ma时,Jupyter给出“ ValueError:无法从重复的轴重新索引”。

我研究了ValueError的含义;引用What does `ValueError: cannot reindex from a duplicate axis` mean?,“当索引具有重复值时,当您联接/分配给列时,通常会出现此错误”。

这很有意义,因为“日期”列(我将其设置为索引)具有很多重叠的值。但是,由于我不在乎有重复的索引(它们为20sma提供高/低),并且我不想覆盖先前的索引(最好包括每个数据点),因此有什么方法可以蛮力添加所有点?


www.kaggle.com/neuromusic/avocado-prices

import pandas as pd

df_avocado = pd.read_csv("avocado.csv")
wanted_columns = ["Date", "AveragePrice", "region"]
df_avocado = df_avocado[wanted_columns]
df_avocado["Date"] = pd.to_datetime(df_avocado["Date"])
df_avocado.set_index("Date", inplace=True)
df_avocado.sort_index(inplace=True)

df_avocado_region_25ma = pd.DataFrame()
for region in df_avocado["region"].unique():
    df_region = df_avocado.copy()[df_avocado["region"] == region]
    df_avocado_region_25ma[f"{region}_25ma"] = df_region["AveragePrice"].rolling(25).mean()
df_avocado_region_25ma.plot()

0 个答案:

没有答案