我正在尝试使用pandas
和groupby
从日期字段中提取月份以进行进一步操作。第40行是我尝试应用dateutil来提取年,月,日的地方。
我的代码:
df = pandas.DataFrame.from_records(defects, columns=headers)
df['date'] = pandas.to_datetime(df['date'], format="%Y-%m-%d")
df['date'] = df['date'].apply(dateutil.parser.parse, yearfirst=True)
....
print df.groupby(['month']).groups.keys()
我得到了:
Traceback (most recent call last):
File "jira-sandbox.py", line 40, in <module>
defects_df['created'] = defects_df['created'].apply(dateutil.parser.parse, yearfirst=True)
File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2294, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer (pandas/lib.c:66124)
File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2282, in <lambda>
f = lambda x: func(x, *args, **kwds)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 697, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 301, in parse
res = self._parse(timestr, **kwargs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 349, in _parse
l = _timelex.split(timestr)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 143, in split
return list(cls(s))
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 137, in next
token = self.get_token()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 68, in get_token
nextchar = self.instream.read(1)
AttributeError: 'Timestamp' object has no attribute 'read'
答案 0 :(得分:0)
我认为您不需要dateutil
操作。在datetime
电话后,该列已经是pandas.to_datetime()
。以下是构建可由groupby()
使用的列的一种方法。
<强>代码:强>
# build a test dataframe
import datetime as dt
df = pd.DataFrame([dt.datetime.now() + dt.timedelta(days=x*15)
for x in range(10)],
columns=['date'])
print(df)
# add a year/moth column to allow grouping
df['month'] = df.date.apply(lambda x: x.year * 100 + x.month)
# show a groupby
print(df.groupby(['month']).groups.keys())
<强>结果:强>
date
0 2017-03-17 14:30:24.344
1 2017-04-01 14:30:24.344
2 2017-04-16 14:30:24.344
3 2017-05-01 14:30:24.344
4 2017-05-16 14:30:24.344
5 2017-05-31 14:30:24.344
6 2017-06-15 14:30:24.344
7 2017-06-30 14:30:24.344
8 2017-07-15 14:30:24.344
9 2017-07-30 14:30:24.344
[201704, 201705, 201706, 201707, 201703]