我正在使用train_sample.csv
here。
请考虑以下内容:
import pandas as pd
df = pd.read_csv('train_sample.csv')
df = df.drop(['attributed_time'], axis=1)
df['click_time'] = pd.to_datetime(df['click_time'])
df = df.set_index('click_time')
df = df.sort_index()
df['clicks_last_hour'] = df.groupby(['ip']).rolling('1H').count()
我要在其中创建一个新列的地方,该列计算过去一个小时中某个ip
clicked
的次数。
我得到:
Traceback (most recent call last):
File "train_sample.py", line 11, in <module>
df['clicks_last_hour'] = df.groupby(['ip']).rolling('1H').count()
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3119, in __setitem__
self._set_item(key, value)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3194, in _set_item
value = self._sanitize_column(key, value)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3378, in _sanitize_column
value = reindexer(value).T
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3358, in reindexer
raise e
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3353, in reindexer
value = value.reindex(self.index)._values
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\util\_decorators.py", line 187, in wrapper
return func(*args, **kwargs)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3566, in reindex
return super(DataFrame, self).reindex(**kwargs)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\generic.py", line 3689, in reindex
fill_value, copy).__finalize__(self)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3501, in _reindex_axes
fill_value, limit, tolerance)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\frame.py", line 3509, in _reindex_index
tolerance=tolerance)
File "C:\Users\galah\Miniconda3\envs\venv\lib\site-packages\pandas\core\indexes\multi.py", line 2068, in reindex
raise Exception("cannot handle a non-unique multi-index!")
Exception: cannot handle a non-unique multi-index!
尽管从我检查的结果来看,没有基于相同的ip
和click_time
的重复项。
我在做什么错了?