我在txt文件中有csv数据,如:
20050601, 25.22, 25.31, 24.71, 24.71, 27385
20050602, 24.68, 25.71, 24.68, 25.45, 16919
20050603, 25.07, 25.40, 24.72, 24.82, 12632
我想将此数据放入一个pandas数据框,其中的列名为date
,close
,high
,low
,open
,{{1 }}
当我使用此代码时:
volume
输出是:
df = pd.read_table(File,header=None,names=['date', 'close', 'high', low', 'open', 'volume'])
当我使用时:
date close high low \
0 20050601, 25.22, 25.31, 24.71, ... NaN NaN NaN
1 20050602, 24.68, 25.71, 24.68, ... NaN NaN NaN
2 20050603, 25.07, 25.40, 24.72, ... NaN NaN NaN
open volume
0 NaN NaN
1 NaN NaN
2 NaN NaN `
输出是:
df = pd.read_table(File,header=None)
我认为当标头设置为none时,标头中的零位于最右边的列上,并导致新名称转到右侧,从而创建新列。我不确定。
感谢任何可以帮助我的人!
答案 0 :(得分:0)
我解决了这个问题:
df = pd.read_table(File,names=['date','close','high','low','open','volume'],sep=',' )
任何人都知道为什么sep=','
需要花费2倍的时间?
Performance difference in pandas read_table vs. read_csv vs. from_csv vs. read_excel?
答案 1 :(得分:0)
您可以使用read_csv
与分隔符,\s+
来表示,
和任意空格:
import pandas as pd
import io
temp=u"""20050601, 25.22, 25.31, 24.71, 24.71, 27385
20050602, 24.68, 25.71, 24.68, 25.45, 16919
20050603, 25.07, 25.40, 24.72, 24.82, 12632"""
#after testing change io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
sep=",\s+",
header=None,
names=['date','close','high','low','open','volume'],
engine='python')
print df
date close high low open volume
0 20050601 25.22 25.31 24.71 24.71 27385
1 20050602 24.68 25.71 24.68 25.45 16919
2 20050603 25.07 25.40 24.72 24.82 12632