我有以下csv文件:
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())
我希望按年+月加上类别(即A,B,C)。
我希望最终数据按月分组,然后按类别分组 作为原始数据的视图
2012-04, A
>> array[0,] => 2012-04-01,00:10, A, 10
>> array[3,] => 2012-04-02,00:10, A, 18
2012-04, B
>> array[1,] => 2012-04-01,00:20, B, 11
>> array[2,] => 2012-04-01,00:30, B, 12
2012-05, A
>> array[4,] => 2012-05-02,00:20, A, 14
...
然后对于每个组,我想迭代使用相同的函数绘制它们。
我已经看到了按日期分割日期的类似问题 Split list of datetimes into days 在我的情况下我能够这样做a)。但是有一些问题可以转化为一年+月分裂,例如b)。
到目前为止,这是我遇到的问题的片段:
#! /usr/bin/python
import numpy as np
import csv
import os
from datetime import datetime
def strToDate(string):
d = datetime.strptime(string, '%Y-%m-%d')
return d;
def strToMonthDate(string):
d = datetime.strptime(string, '%Y-%m-%d')
d_by_month = datetime(d.year,d.month,1)
return d_by_month;
# simulate a csv file
from StringIO import StringIO
data = StringIO("""
2012-04-01,00:10, A, 10
2012-04-01,00:20, B, 11
2012-04-01,00:30, B, 12
2012-04-02,00:10, A, 18
2012-05-02,00:20, A, 14
2012-05-02,00:30, B, 11
2012-05-03,00:10, A, 10
2012-06-03,00:20, B, 13
2012-06-03,00:30, C, 12
""".strip())
arr = np.genfromtxt(data, delimiter=',', dtype=object)
# a) If we were to just group by dates
# Get unique dates
#keys = np.unique(arr[:,0])
#keys1 = np.unique(arr[:,2])
# Group by unique dates
#for key in keys:
# print key
# for key1 in keys1:
# group = arr[ (arr[:,0]==key) & (arr[:,2]==key1) ]
# if group.size:
# print "\t" + key1
# print group
# print "\n"
# b) But if we want to group by year+month in the dates
dates_by_month = np.array(map(strToMonthDate, arr[:,0]))
keys2 = np.unique(dates_by_month)
print dates_by_month
# >> [datetime.datetime(2012, 4, 1, 0, 0), datetime.datetime(2012, 4, 1, 0, 0), ...
print "\n"
print keys2
# >> [2012-04-01 00:00:00 2012-05-01 00:00:00 2012-06-01 00:00:00]
for key in keys2:
print key
print type(key)
group = arr[dates_by_month==key]
print group
print "\n"
问题:我得到了每月一次的密钥,但是对于这个组,我得到的是每个组的[2012-04-01 00:10 A 10]。 keys2中的键是datetime.datetime类型。知道什么可能是错的吗?欢迎任何替代实施建议。我宁愿不使用itertools.groupby解决方案,因为它返回迭代器而不是数组,这不太适合绘图。
编辑1:问题已解决。问题是我在事件b)中预先使用索引的dates_by_month应该初始化为np.array而不是map返回dates_by_month = np.array(map(strToMonthDate,arr [:,0]))。我已将其修复到上面的代码段中,现在该示例正常运行。
答案 0 :(得分:4)
我在原始解决方案中找到了问题所在。
如果是b),
dates_by_month = map(strToMonthDate, arr[:,0])
返回一个列表而不是一个numpy数组。预先索引:
group = arr[dates_by_month==key]
因此不行。相反,我有:
dates_by_month = np.array(map(strToMonthDate, arr[:,0]))
然后分组按预期工作。