Question

我得到一个TypeError：

TypeError: '<' not supported between instances of 'datetime.date' and 'str'`

运行以下代码时：

import requests 
import re 
import json 
import pandas as pd 

def retrieve_quotes_historical(stock_code):
    quotes = []
    url = 'https://finance.yahoo.com/quote/%s/history?p=%s' % (stock_code, stock_code)
    r = requests.get(url)
    m = re.findall('"HistoricalPriceStore":{"prices":(.*?),"isPending"',
r.text)
    if m:
        quotes = json.loads(m[0])
        quotes = quotes[::-1]
    return  [item for item in quotes if not 'type' in item]

quotes = retrieve_quotes_historical('INTC')
df=pd.DataFrame(quotes)

s=pd.Series(pd.to_datetime(df.date,unit='s'))
df.date=s.dt.date
df=df.set_index('date')

这篇文章运行顺畅，但当我尝试运行这段代码时：

df['2017-07-07':'2017-07-10']

我得到了TypeError。

任何人都可以帮助我吗？

Answer 1

当您的索引是datetime.date类型时，您希望使用字符串'2017-07-07'进行切片。你的切片也应该属于这种类型。

您可以通过定义startdate并按以下方式结束：

import pandas as pd

startdate = pd.to_datetime("2017-7-7").date()
enddate = pd.to_datetime("2017-7-10").date()
df.loc[startdate:enddate]

startdate＆amp; enddate现在是datetime.date类型，你的切片可以工作：

    adjclose    close   high    low open    volume
date                        
2017-07-07  33.205006   33.880001   34.119999   33.700001   33.700001   18304500
2017-07-10  32.979588   33.650002   33.740002   33.230000   33.250000   29918400

也可以创建没有pandas的datetime.date类型：

import datetime

startdate = datetime.datetime.strptime('2017-07-07', "%Y-%m-%d").date()
enddate = datetime.datetime.strptime('2017-07-10', "%Y-%m-%d").date()

Answer 2

除了 Paul's answer，还有几点需要注意：

pd.to_datetime(df['date'],unit='s') 已经返回一个 Series，因此您不需要包装它。
此外，解析成功时 Series 返回的 pd.to_datetime 有 dtype datetime64[ns] (timezone-naïve) 或 datetime64[ ns, tz]（时区感知）。如果解析失败，它可能仍然返回一个没有错误的系列，dtype O 用于“对象”（至少在 Pandas 1.2.4 中），表示回退到 python 的 stdlib datetime.datetime。
在 df['2017-07-07':'2017-07-10'] 中使用字符串进行过滤仅在索引的 dtype 为 datetime64[...] 时才有效，在索引为 O 时无效（对象

有了所有这些，您的示例只需更改最后几行即可：

df = pd.DataFrame(quotes)
s = pd.to_datetime(df['date'],unit='s')   # no need to wrap in Series
assert str(s.dtype) == 'datetime64[ns]'   # VERY IMPORTANT !!!!
df.index = s
print(df['2020-08-01':'2020-08-10'])    # it now works !

它产生：

                           date       open  ...    volume   adjclose
date                                        ...                     
2020-08-03 13:30:00  1596461400  48.270000  ...  31767100  47.050617
2020-08-04 13:30:00  1596547800  48.599998  ...  29045800  47.859154
2020-08-05 13:30:00  1596634200  49.720001  ...  29438600  47.654583
2020-08-06 13:30:00  1596720600  48.790001  ...  23795500  47.634968
2020-08-07 13:30:00  1596807000  48.529999  ...  36765200  47.105358
2020-08-10 13:30:00  1597066200  48.200001  ...  37442600  48.272457

最后还要注意，如果您的日期时间格式以某种方式包含时间偏移，则似乎有一个强制性的 utc=True 参数要添加（在 Pandas 1.2.4 中）到 pd.to_datetime，否则返回的 dtype 将即使解析成功，也是 'O'。我希望这会在未来有所改善，因为它根本不直观。

有关详细信息，请参阅 to_datetime 文档。

'＆LT;' 'datetime.date'和'str'实例之间不支持

2 个答案: