我尝试使用
data = data.loc['bids:']
获取与索引对应的所有行。
从文本文件中采样数据:
{"offset":"14726172634","bids":[["871094.22000","0.00200000","0.00200000","0","1081537351","29194","5","14726172633","1"],["871076.11000","0.00808000","0.00808000","0","1081537130","623964","5","14726172043","1"],["871073.96500","0.00100000","0.00100000","0","1081537185","29194","5","14726172231","1"]]],
"asks":[["875644.72000","0.00200000","0.00200000","0","1081606189","29194","5","14726356256","1"],["875669.77637","0.01000000","0.01000000","0","1081606227","29194","5","14726356379","1"],["875678.92000","0.00600000","0.00600000","0","1081606263","29194","5","14726356488","1"],["875731.74364","0.03000000","0.03000000","0","1081606233","29194","5","14726356393","1"],
代码示例:
data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)#, names= ['a','d','f','r','y','h','n','m','k'])
new = data[1].str.split("[", n = 1, expand = True)
data[1]= new[0]
data[10]= new[1]
data.drop(data.index[-1], inplace=True)
data[10]= new[1].str.strip('[').str.strip('"')
data = data.set_index([1,2])
data = data.loc[:,[10]]
data = data.loc['bids:']
数据示例:
bids: 0.002000 871094.22000
0.008080 871076.11000
0.001000 871073.96500
bids: 0.005000 871042.87000
0.005000 871038.55000
0.001000 871032.90156
代码输出:
bids: 0.002000 871094.22000
bids: 0.005000 871042.87000
请问如何获得6行?目的是在其他索引标签之间进行过滤。
索引输出为:
Index(['bids:', '', '', '', '', '', '', '', '', '',
...
'asks:', '', '', '', '', '', '', '', '', '',
...
'bids:', '', '', '', '', '', '', '', '', '',
...'],
dtype='object', name=1, length=505148)
答案 0 :(得分:0)
我相信您需要:
print (data)
10
1 2
bids 0.00200000 871094.22000
0.00808000 871076.11000
0.00100000 871073.96500
asks 0.00200000 875644.72000
0.01000000 875669.77637
0.00600000 875678.92000
0.03000000 875731.74364
bids 0.00200000 871094.22000
0.00808000 871076.11000
0.00100000 871073.96500
print (data.index)
MultiIndex(levels=[['asks', 'bids'],
['0.00100000', '0.00200000', '0.00600000',
'0.00808000', '0.01000000', '0.03000000']],
codes=[[1, 1, 1, 0, 0, 0, 0, 1, 1, 1], [1, 3, 0, 1, 4, 2, 5, 1, 3, 0]],
names=[1, 2])
第一个MultiIndex
级别的第一个值的解决方案,重复的值:
s = data.index.get_level_values(0).to_series()
mask = s.ne(s.shift())
print (mask)
1
bids True
bids False
bids False
asks True
asks False
asks False
asks False
bids True
bids False
bids False
Name: 1, dtype: bool
df = data[mask.values]
print (df)
10
1 2
bids 0.00200000 871094.22000
asks 0.00200000 875644.72000
bids 0.00200000 871094.22000
df = df.xs('bids', drop_level=False)
print (df)
10
1 2
bids 0.00200000 871094.22000
0.00200000 871094.22000
如果没有MultiIndex
:
print (data)
2 10
1
bids 0.00200000 871094.22000
bids 0.00808000 871076.11000
bids 0.00100000 871073.96500
asks 0.00200000 875644.72000
asks 0.01000000 875669.77637
asks 0.00600000 875678.92000
asks 0.03000000 875731.74364
bids 0.00200000 871094.22000
bids 0.00808000 871076.11000
bids 0.00100000 871073.96500
print (data.index)
Index(['bids', 'bids', 'bids', 'asks', 'asks', 'asks', 'asks', 'bids', 'bids',
'bids'],
dtype='object', name=1)
s = data.index.to_series()
mask = s.ne(s.shift())
print (mask)
1
bids True
bids False
bids False
asks True
asks False
asks False
asks False
bids True
bids False
bids False
Name: 1, dtype: bool
df = data[mask.values].loc['bids']
print (df)
2 10
1
bids 0.00200000 871094.22000
bids 0.00200000 871094.22000