如何在数据框中按日期对行进行分组?

时间:2019-01-17 15:48:24

标签: python python-3.x datetime dataframe

我有一个数据帧news_df,其中包含文章标题和日期,并且我希望将同一天在同一行撰写的文章归为一组。

    name
date    
2019-01-17 14:41:00 Forte hausse de l'indice Philly Fed en janvier
2019-01-17 14:36:00 Baisse des inscriptions hebdomadaires au chômage
2019-01-16 22:30:00 Wall Street finit en hausse, Goldman Sachs et ...
2019-01-16 16:14:00 Wall Street, soutenue par les résultats de ban...
2019-01-16 14:36:00 Baisse de 1% des prix à l'import en décembre
...

我尝试过:

news_df.resample('D', on='name')

但是它给我一个TypeError:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-275-bdfd57eadc21> in <module>
----> 1 news_df.resample('D', on='name')

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base, on, level)
   7108                      axis=axis, kind=kind, loffset=loffset,
   7109                      convention=convention,
-> 7110                      base=base, key=on, level=level)
   7111         return _maybe_process_deprecations(r,
   7112                                            how=how,

    C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\resample.py in resample(obj, kind, **kwds)
       1146     """ create a TimeGrouper and return our resampler """
       1147     tg = TimeGrouper(**kwds)
    -> 1148     return tg._get_resampler(obj, kind=kind)
       1149 
       1150 

    C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\resample.py in _get_resampler(self, obj, kind)
       1274         raise TypeError("Only valid with DatetimeIndex, "
       1275                         "TimedeltaIndex or PeriodIndex, "
    -> 1276                         "but got an instance of %r" % type(ax).__name__)
       1277 
       1278     def _get_grouper(self, obj, validate=True):

    TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

1 个答案:

答案 0 :(得分:0)

# get date out of the index to column    
df = df.reset_index()
# optional
df['date'] = pd.to_datetime(df['date'])
# groupby and output group rows as list
df = df.groupby('date')['name'].apply(list)

编辑:

您需要将strptime格式设置为输入日期的任何格式。

df['date'] = df['date'].apply(lambda x: dt.datetime.strptime(x, "%d/%m/%Y %H%M%S").strftime('%d/%m/%Y'))