python pandas dataframe:命名列正在创建新列

时间:2016-01-15 02:48:38

标签: python csv pandas dataframe

我在txt文件中有csv数据,如:

20050601,      25.22,      25.31,      24.71,      24.71,   27385
20050602,      24.68,      25.71,      24.68,      25.45,   16919
20050603,      25.07,      25.40,      24.72,      24.82,   12632

我想将此数据放入一个pandas数据框,其中的列名为dateclosehighlowopen,{{1 }}

当我使用此代码时:

volume

输出是:

df = pd.read_table(File,header=None,names=['date', 'close', 'high', low', 'open', 'volume'])

当我使用时:

                                             date  close  high  low  \
0     20050601,      25.22,      25.31,      24.71, ...    NaN   NaN  NaN   
1     20050602,      24.68,      25.71,      24.68, ...    NaN   NaN  NaN   
2     20050603,      25.07,      25.40,      24.72, ...    NaN   NaN  NaN   
  open  volume  
0      NaN     NaN  
1      NaN     NaN  
2      NaN     NaN  `

输出是:

df = pd.read_table(File,header=None)

我认为当标头设置为none时,标头中的零位于最右边的列上,并导致新名称转到右侧,从而创建新列。我不确定。

感谢任何可以帮助我的人!

2 个答案:

答案 0 :(得分:0)

我解决了这个问题:

df = pd.read_table(File,names=['date','close','high','low','open','volume'],sep=',' )

任何人都知道为什么sep=','需要花费2倍的时间? Performance difference in pandas read_table vs. read_csv vs. from_csv vs. read_excel?

答案 1 :(得分:0)

您可以使用read_csv与分隔符,\s+来表示,和任意空格:

import pandas as pd
import io

temp=u"""20050601,      25.22,      25.31,      24.71,      24.71,   27385
20050602,      24.68,      25.71,      24.68,      25.45,   16919
20050603,      25.07,      25.40,      24.72,      24.82,   12632"""


#after testing change io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), 
                 sep=",\s+", 
                 header=None, 
                 names=['date','close','high','low','open','volume'], 
                 engine='python')

print df

       date  close   high    low   open  volume
0  20050601  25.22  25.31  24.71  24.71   27385
1  20050602  24.68  25.71  24.68  25.45   16919
2  20050603  25.07  25.40  24.72  24.82   12632