Question

我正在尝试使用pandas和groupby从日期字段中提取月份以进行进一步操作。第40行是我尝试应用dateutil来提取年，月，日的地方。

我的代码：

df = pandas.DataFrame.from_records(defects, columns=headers)
df['date'] = pandas.to_datetime(df['date'], format="%Y-%m-%d")
df['date'] = df['date'].apply(dateutil.parser.parse, yearfirst=True)
 ....
print df.groupby(['month']).groups.keys()

我得到了：

Traceback (most recent call last):
 File "jira-sandbox.py", line 40, in <module>
 defects_df['created'] =    defects_df['created'].apply(dateutil.parser.parse, yearfirst=True)
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer (pandas/lib.c:66124)
  File "/Library/Python/2.7/site-packages/pandas/core/series.py", line 2282, in <lambda>
    f = lambda x: func(x, *args, **kwds)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 697, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 301, in parse
    res = self._parse(timestr, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 349, in _parse
    l = _timelex.split(timestr)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 143, in split
    return list(cls(s))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 137, in next
    token = self.get_token()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/dateutil/parser.py", line 68, in get_token
    nextchar = self.instream.read(1)
AttributeError: 'Timestamp' object has no attribute 'read'

Answer 1

我认为您不需要dateutil操作。在datetime电话后，该列已经是pandas.to_datetime()。以下是构建可由groupby()使用的列的一种方法。

<强>代码：

# build a test dataframe
import datetime as dt
df = pd.DataFrame([dt.datetime.now() + dt.timedelta(days=x*15)
                   for x in range(10)],
                  columns=['date'])
print(df)

# add a year/moth column to allow grouping
df['month'] = df.date.apply(lambda x: x.year * 100 + x.month)

# show a groupby
print(df.groupby(['month']).groups.keys())

<强>结果：

                     date
0 2017-03-17 14:30:24.344
1 2017-04-01 14:30:24.344
2 2017-04-16 14:30:24.344
3 2017-05-01 14:30:24.344
4 2017-05-16 14:30:24.344
5 2017-05-31 14:30:24.344
6 2017-06-15 14:30:24.344
7 2017-06-30 14:30:24.344
8 2017-07-15 14:30:24.344
9 2017-07-30 14:30:24.344

[201704, 201705, 201706, 201707, 201703]

AttributeError：'Timestamp'对象没有属性'read'

1 个答案: