我有一个csv文件,其中包含假日日期列表。
f5_name = r'C:\\holidays.csv'
holidays = pd.read_csv(f5_name, parse_dates=True)
您可以使用以下内容重现holidays
数据框:
nerc_holidays.to_dict()
{'dt': {0: '2016-09-05',
1: '2016-11-24',
2: '2016-12-26',
3: '2017-01-02',
4: '2017-05-29',
5: '2017-07-04',
6: '2017-09-04',
7: '2017-11-23',
8: '2017-12-15',
9: '2018-01-01',
10: '2018-05-28',
11: '2018-07-04',
12: '2018-09-03',
13: '2018-11-22',
14: '2018-12-25'}}
您可以看到我将parse_dates = True
参数添加到pd.read_csv()
执行。
现在,我有另一个名为databasedf
的数据帧。我想过滤databasedf
,以便日期列(dt)的日期位于holiday
数据框中。
当我运行以下内容时:
databasedf[databasedf['dt'].isin(holidays)]
我收到了这个错误:
TypeError Traceback (most recent call last)
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\tseries\tools.py in _convert_listlike(arg, box, format, name)
408 try:
--> 409 values, tz = tslib.datetime_to_datetime64(arg)
410 return DatetimeIndex._simple_new(values, name=name, tz=tz)
pandas\tslib.pyx in pandas.tslib.datetime_to_datetime64 (pandas\tslib.c:29768)()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-519-b57c47ecb0e5> in <module>()
----> 1 databasedf[databasedf['dt'].isin(holidays)]
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\series.py in isin(self, values)
2413
2414 """
-> 2415 result = algos.isin(_values_from_object(self), values)
2416 return self._constructor(result, index=self.index).__finalize__(self)
2417
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\core\algorithms.py in isin(comps, values)
129 if com.is_datetime64_dtype(comps):
130 from pandas.tseries.tools import to_datetime
--> 131 values = to_datetime(values)._values.view('i8')
132 comps = comps.view('i8')
133 elif com.is_timedelta64_dtype(comps):
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\util\decorators.py in wrapper(*args, **kwargs)
89 else:
90 kwargs[new_arg_name] = new_arg_value
---> 91 return func(*args, **kwargs)
92 return wrapper
93 return _deprecate_kwarg
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\tseries\tools.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, coerce, unit, infer_datetime_format)
289 yearfirst=yearfirst,
290 utc=utc, box=box, format=format, exact=exact,
--> 291 unit=unit, infer_datetime_format=infer_datetime_format)
292
293
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\tseries\tools.py in _to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, freq, infer_datetime_format)
425 return _convert_listlike(arg, box, format, name=arg.name)
426 elif com.is_list_like(arg):
--> 427 return _convert_listlike(arg, box, format)
428
429 return _convert_listlike(np.array([arg]), box, format)[0]
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\tseries\tools.py in _convert_listlike(arg, box, format, name)
410 return DatetimeIndex._simple_new(values, name=name, tz=tz)
411 except (ValueError, TypeError):
--> 412 raise e
413
414 if arg is None:
C:\Users\XXX\Anaconda3\lib\site-packages\pandas\tseries\tools.py in _convert_listlike(arg, box, format, name)
396 yearfirst=yearfirst,
397 freq=freq,
--> 398 require_iso8601=require_iso8601
399 )
400
pandas\tslib.pyx in pandas.tslib.array_to_datetime (pandas\tslib.c:41972)()
pandas\tslib.pyx in pandas.tslib.array_to_datetime (pandas\tslib.c:41577)()
pandas\tslib.pyx in pandas.tslib.array_to_datetime (pandas\tslib.c:41466)()
pandas\tslib.pyx in pandas.tslib.parse_datetime_string (pandas\tslib.c:31806)()
C:\Users\XXX\Anaconda3\lib\site-packages\dateutil\parser.py in parse(timestr, parserinfo, **kwargs)
1162 return parser(parserinfo).parse(timestr, **kwargs)
1163 else:
-> 1164 return DEFAULTPARSER.parse(timestr, **kwargs)
1165
1166
C:\Users\XXX\Anaconda3\lib\site-packages\dateutil\parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
553
554 if res is None:
--> 555 raise ValueError("Unknown string format")
556
557 if len(res) == 0:
ValueError: Unknown string format
{I}}功能仅在我执行以下操作后起作用:
.isin()
为什么我必须将值强制为datetime,而实际上我已经通过了holidays = pd.to_datetime(holidays['dt'])
中的parse_dates=True
参数?
答案 0 :(得分:1)
我认为如果需要输出index_col
,您也可以将参数dt
与列parse_dates
和DateTimes
一起使用:
import pandas as pd
from pandas.compat import StringIO
temp=u"""dt
2016-09-05
2016-11-24
2016-12-26
2017-01-02"""
#after testing replace StringIO(temp) to f5_name
holidays = pd.read_csv(StringIO(temp), index_col=['dt'], parse_dates=['dt'])
print (holidays.index)
DatetimeIndex(['2016-09-05', '2016-11-24', '2016-12-26', '2017-01-02'], dtype='datetime64[ns]', name='dt', freq=None)
如果需要输出为字符串列表:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
temp=u"""dt
2016-09-05
2016-11-24
2016-12-26
2017-01-02"""
#after testing replace StringIO(temp) to filename
holidays = pd.read_csv(StringIO(temp), index_col=['dt'])
print (holidays.index.tolist())
['2016-09-05', '2016-11-24', '2016-12-26', '2017-01-02']
您的代码中还需要holidays['dt']
,因为需要选择嵌套的dictionary
。
parse_dates=True
用于将转化索引转换为DatetimeIndex
- 请参阅docs。但是如果没有设置DatetimeIndex
,它似乎什么都不做:
temp=u"""dt
2016-09-05
2016-11-24
2016-12-26
2017-01-02"""
#after testing replace StringIO(temp) to filename
holidays = pd.read_csv(StringIO(temp), parse_dates=True)
print (holidays)
dt
0 2016-09-05
1 2016-11-24
2 2016-12-26
3 2017-01-02
print (type(holidays.loc[0,'dt']))
<class 'str'>
print (holidays.dt.to_dict())
{0: '2016-09-05', 1: '2016-11-24', 2: '2016-12-26', 3: '2017-01-02'}