Question

我有一个很大的.csv文件，它会不断实时更新，显示数千行，如下所示：

 time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2

将此内容读取为dataframe的最快方式是：

   time     stock       bid    ask
   time1    stockA      1      
   time2    stockA             1.1
   time3    stockB             2.1
   time4    stockB      2.0    
   time5    stockA      1.1
   time6    stockA             1.2

感谢任何帮助

Answer 1

您可以使用read_csv并指定header=None并将列名称作为列表传递：

In [124]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0"""

df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[124]:
     time   stock  bid  ask
0   time1  stockA  bid  1.0
1   time2  stockA  ask  1.1
2   time3  stockB  ask  2.1
3   time4  stockB  bid  2.0

您必须将出价列重新编码为1或2：

In [126]:

df['bid'] = df['bid'].replace('bid', 1)
df['bid'] = df['bid'].replace('ask', 2)
df
Out[126]:
     time   stock  bid  ask
0   time1  stockA    1  1.0
1   time2  stockA    2  1.1
2   time3  stockB    2  2.1
3   time4  stockB    1  2.0

修改

根据您更新的样本数据和所需的输出，以下工作：

In [29]: t="""time1,stockA,bid,1 time2,stockA,ask,1.1 time3,stockB,ask,2.1 time4,stockB,bid,2.0 time5,stockA,bid,1.1 time6,stockA,ask,1.2""" df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask']) df Out[29]: time stock bid ask 0 time1 stockA bid 1.0 1 time2 stockA ask 1.1 2 time3 stockB ask 2.1 3 time4 stockB bid 2.0 4 time5 stockA bid 1.1 5 time6 stockA ask 1.2 In [30]: df.loc[df['bid'] == 'bid', 'bid'] = df['ask'] df.loc[df['bid'] != 'ask', 'ask'] = '' df.loc[df['bid'] == 'ask','bid'] = '' df Out[30]: time stock bid ask 0 time1 stockA 1 1 time2 stockA 1.1 2 time3 stockB 2.1 3 time4 stockB 2 4 time5 stockA 1.1 5 time6 stockA 1.2

Answer 2

我认为这是一种更简洁的方式。

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

我认为DataFrame应该是什么样子。然后做

In [1064]:

df.unstack()
Out[1064]:
                prices
type            ask bid high low
time    stock               
time1   stockA  NaN 1.0 NaN NaN
time2   stockA  1.1 NaN NaN NaN
time3   stockB  2.1 NaN NaN NaN
time4   stockB  NaN 2.0 NaN NaN
time5   stockA  NaN 1.1 NaN NaN
time6   stockA  1.2 NaN NaN NaN
time7   stockA  NaN NaN 1.5 NaN
time8   stockA  NaN NaN NaN 0.5

您可以使用df.fillna填写您喜欢的任何内容。一般来说，将列值转换为列标题称为透视。 .unstack支持MultiIndex的级别。您也可以查看.pivot。

将列元素转换为pandas中的列名

2 个答案: