我有一个很大的.csv
文件,它会不断实时更新,显示数千行,如下所示:
time1,stockA,bid,1
time2,stockA,ask,1.1
time3,stockB,ask,2.1
time4,stockB,bid,2.0
time5,stockA,bid,1.1
time6,stockA,ask,1.2
将此内容读取为dataframe
的最快方式是:
time stock bid ask
time1 stockA 1
time2 stockA 1.1
time3 stockB 2.1
time4 stockB 2.0
time5 stockA 1.1
time6 stockA 1.2
感谢任何帮助
答案 0 :(得分:1)
您可以使用read_csv
并指定header=None
并将列名称作为列表传递:
In [124]:
t="""time1,stockA,bid,1
time2,stockA,ask,1.1
time3,stockB,ask,2.1
time4,stockB,bid,2.0"""
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[124]:
time stock bid ask
0 time1 stockA bid 1.0
1 time2 stockA ask 1.1
2 time3 stockB ask 2.1
3 time4 stockB bid 2.0
您必须将出价列重新编码为1或2:
In [126]:
df['bid'] = df['bid'].replace('bid', 1)
df['bid'] = df['bid'].replace('ask', 2)
df
Out[126]:
time stock bid ask
0 time1 stockA 1 1.0
1 time2 stockA 2 1.1
2 time3 stockB 2 2.1
3 time4 stockB 1 2.0
修改强>
根据您更新的样本数据和所需的输出,以下工作:
In [29]:
t="""time1,stockA,bid,1
time2,stockA,ask,1.1
time3,stockB,ask,2.1
time4,stockB,bid,2.0
time5,stockA,bid,1.1
time6,stockA,ask,1.2"""
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[29]:
time stock bid ask
0 time1 stockA bid 1.0
1 time2 stockA ask 1.1
2 time3 stockB ask 2.1
3 time4 stockB bid 2.0
4 time5 stockA bid 1.1
5 time6 stockA ask 1.2
In [30]:
df.loc[df['bid'] == 'bid', 'bid'] = df['ask']
df.loc[df['bid'] != 'ask', 'ask'] = ''
df.loc[df['bid'] == 'ask','bid'] = ''
df
Out[30]:
time stock bid ask
0 time1 stockA 1
1 time2 stockA 1.1
2 time3 stockB 2.1
3 time4 stockB 2
4 time5 stockA 1.1
5 time6 stockA 1.2
答案 1 :(得分:1)
我认为这是一种更简洁的方式。
df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type', 'prices'],
index_col=['time', 'stock', 'type'])
In [1062]:
df
Out[1062]:
prices
time stock type
time1 stockA bid 1.0
time2 stockA ask 1.1
time3 stockB ask 2.1
time4 stockB bid 2.0
time5 stockA bid 1.1
time6 stockA ask 1.2
time7 stockA high1.5
time8 stockA low 0.5
我认为DataFrame应该是什么样子。 然后做
In [1064]:
df.unstack()
Out[1064]:
prices
type ask bid high low
time stock
time1 stockA NaN 1.0 NaN NaN
time2 stockA 1.1 NaN NaN NaN
time3 stockB 2.1 NaN NaN NaN
time4 stockB NaN 2.0 NaN NaN
time5 stockA NaN 1.1 NaN NaN
time6 stockA 1.2 NaN NaN NaN
time7 stockA NaN NaN 1.5 NaN
time8 stockA NaN NaN NaN 0.5
您可以使用df.fillna
填写您喜欢的任何内容。一般来说,将列值转换为列标题称为透视。 .unstack
支持MultiIndex的级别。您也可以查看.pivot
。