Question

运行代码时，

import pandas as pd
import io


df = pd.read_table("./stock.txt", names=["ID", "Date","Open","High","Low","Close"])
df

del df['ID']

df=df.set_index(["Date"])
df

发生此错误

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/py/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3062             try:
-> 3063                 return self._engine.get_loc(key)
   3064             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Date'

我阅读了set_index的熊猫文件，但是为什么会发生这样的错误呢？在第一个df中，DataFrame显示为

"Date"  "Open"  "High"  "Low"  "Close"
2010-04-01 3,615  3,615   3,580   3,585
2010-04-02 3,570  3,620   3,570   3,590
　　　　 ・
　　　　 ・
　　　　 ・

我理想的第二个DataFrame是

    　　　　"Open"  "High"  "Low"  "Close"
"Date"  
2010-04-01 3,615  3,615   3,580   3,585
2010-04-02 3,570  3,620   3,570   3,590
　　　　 ・
　　　　 ・
　　　　 ・

要制作出如此理想的DataFrame，我应该如何修复代码？出了什么问题？我的文本文件就像

1,1001 2010-04-01 3,615  3,615   3,580   3,585
2,1002 2010-04-02 3,570  3,620   3,570   3,590
　　　　 ・
　　　　 ・
　　　　 ・

Answer 1

我认为最好是在这里使用：

df = pd.read_table("./stock.txt", 
                    sep='\s+', #if separator is whitespace
                    names=["ID", "Date","Open","High","Low","Close"], 
                    parse_dates=['Date'], 
                    index_col=['Date'],
                    thousands=',')

替代：

df = pd.read_csv("./stock.txt", 
                 sep='\s+', #if separator is whitespace
                 names=["ID", "Date","Open","High","Low","Close"], 
                 parse_dates=['Date'], 
                 index_col=['Date'],
                 thousands=',')

差异：

read_table的默认分隔符是tab（sep=\t），而read_csv的分隔符是逗号（sep=','）。

编辑：

使用示例数据进行测试：

import pandas as pd

temp=u"""1,1001 2010-04-01 3,615  3,615   3,580   3,585
2,1002 2010-04-02 3,570  3,620   3,570   3,590"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), 
                 sep="\s+", #or sep='\t' 
                 usecols=["Date","Open","High","Low","Close"],
                 names=["ID", "Date","Open","High","Low","Close"],
                 parse_dates=['Date'], 
                 index_col=['Date'],
                 thousands=',')

print (df)

            Open  High   Low  Close
Date                               
2010-04-01  3615  3615  3580   3585
2010-04-02  3570  3620  3570   3590

KeyError：“日期”，我使用set_index是否错误？

1 个答案: