我正在尝试使用python pandas中的DataFrame.from_csv()读取文件。该文件包含此值。
TICKER,date,ASKHI,PRC,BIDLO,PortfolioDate,PortfolioName
MSFT,2012-06-29 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-07-31 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-08-31 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-09-28 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-10-31 00:00:00,28.88,28.54,28.5,2010-12-31 00:00:00,SAP500
但是,当我访问时,我从数据帧中读取它,帧生成如下。
date ASKHI PRC BIDLO PortfolioDate \
TICKER
MSFT 2012-06-29 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
MSFT 2012-07-31 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
MSFT 2012-08-31 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
MSFT 2012-09-28 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
MSFT 2012-10-31 00:00:00 28.88 28.54 28.5 2010-12-31 00:00:00
PortfolioName
TICKER
MSFT SAP500
MSFT SAP500
MSFT SAP500
MSFT SAP500
MSFT SAP500
当我使用frame ['date']选择列'date'时,结果是:
TICKER
MSFT 2012-06-29 00:00:00
MSFT 2012-07-31 00:00:00
MSFT 2012-08-31 00:00:00
MSFT 2012-09-28 00:00:00
MSFT 2012-10-31 00:00:00
我的代码是:
frame = DataFrame.from_csv('/home/raghu/log.txt',sep=',');
我是新手。有什么我想念的吗?为什么第一列是这样的?
编辑:熊猫版:'0.14.1'
答案 0 :(得分:3)
请勿使用from_csv
不再维护,而是使用read_csv
:
In [112]:
import io
temp="""TICKER,date,ASKHI,PRC,BIDLO,PortfolioDate,PortfolioName
MSFT,2012-06-29 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-07-31 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-08-31 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-09-28 00:00:00,NA,NA,NA,2010-12-31 00:00:00,SAP500
MSFT,2012-10-31 00:00:00,28.88,28.54,28.5,2010-12-31 00:00:00,SAP500"""
df = pd.read_csv(io.StringIO(temp))
df
Out[112]:
TICKER date ASKHI PRC BIDLO PortfolioDate \
0 MSFT 2012-06-29 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
1 MSFT 2012-07-31 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
2 MSFT 2012-08-31 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
3 MSFT 2012-09-28 00:00:00 NaN NaN NaN 2010-12-31 00:00:00
4 MSFT 2012-10-31 00:00:00 28.88 28.54 28.5 2010-12-31 00:00:00
PortfolioName
0 SAP500
1 SAP500
2 SAP500
3 SAP500
4 SAP500
In [113]:
df['date']
Out[113]:
0 2012-06-29 00:00:00
1 2012-07-31 00:00:00
2 2012-08-31 00:00:00
3 2012-09-28 00:00:00
4 2012-10-31 00:00:00
Name: date, dtype: object
您对第一列感到陌生的原因是,当您使用from_csv
时,它会将第一列视为索引(index_col
的默认值为0
){ {3}}没有(index_col
的默认值为None
)。
修改强>
要修正错误而不升级,只需将参数中的index_col=None
设置为from_csv
:
In [115]:
df = pd.DataFrame.from_csv(io.StringIO(temp), index_col=None)
df['date']
Out[115]:
0 2012-06-29 00:00:00
1 2012-07-31 00:00:00
2 2012-08-31 00:00:00
3 2012-09-28 00:00:00
4 2012-10-31 00:00:00
Name: date, dtype: object