Question

我有一个非常大的数据框，带有日期索引，涵盖了一天中多年的时间。每天包含多个值。

      Date (DT_index)   Description   Value1
  1      2015-01-12     stringvalue    10
  2      2015-01-12     stringvalue    12
  3      2015-01-12     stringvalue    14
  4      2015-02-12     stringvalue    16
  5      2015-02-12     stringvalue   348
  6      2015-09-12     stringvalue     1
  7      2015-09-12     stringvalue     9
                  (.....)
8456     2017-11-03     stringvalue    10
8457     2017-11-03     stringvalue   111
8458     2017-11-04     stringvalue    29

我想要的是根据月/年将此csv拆分为单独的文件。（所以文件如：12-2015.csv，01-2016.csv，02-2016.csv）

我已将大型csv加载到pandas df中并按月分组，

dfgp = df.groupby(pd.TimeGrouper(freq='M'))

但是我可以使用的唯一操作似乎是＆＃39; sum＆＃39;或者＆＃39; avg＆＃39;。我不想要的，我希望按月分割大型DF，而不是执行更改或聚合数据的.apply操作。

我也试过这段代码：

dfgp = [group[1] for group in df.groupby(df.index.date)]

for x in result:
    name = str(x.index.date.month.year)
    x.to_csv(name, sep=';')

这种方法非常接近。我有2个问题。 1.我的命名方法不起作用：

'numpy.ndarray' object has no attribute 'month'

当我删除我的名字方法时，它会迭代文件。但是这些小组是按天制作的（例如：2015-12-13，有6个参赛作品，而不是2015-12-期刊，有238个参赛作品）

我要用这段代码纠正上一期：

result = [group[1] for group in df.groupby(df.index.date.month)]

但是这只是犯了同样的错误：

'numpy.ndarray' object has no attribute 'month'

有谁知道我做错了什么？

Answer 1

试试吧：

for n,g in df.groupby(pd.Grouper(freq='M')):
    name = n.strftime('%Y%m') + '.csv'
    g.to_csv(name, sep=';')

Answer 2

可能有更好的（更多pandathonic）方法：

import os

# this just assumes that you want to save where the 
# current file is located
csv_path = 'path\to\your.csv'
data_path = os.path.dirname(csv_path)

# read the csv and add a simple string column for indexing
df = pd.read_csv(csv_path)
df['date_filters'] = df['Date'].str.strftime('%m-%Y')

# iterate over the months present
for month in df['date_filters'].unique():
    # slice out the month
    month_df = df[df['date_filters'] == month]
    # drop the string column you added before saving
    month_df.drop('date_filters', inplace=True, axis=1)
    # make the path and save
    month_path = os.path.join(data_path, month+'.csv')
    month_df.to_csv(month_path, index=False)

熊猫：按日期将大文件拆分成单独的文件，保留原始排序。

2 个答案: