根据索引选择行

时间:2019-05-27 03:05:31

标签: python pandas

我尝试使用

data = data.loc['bids:']

获取与索引对应的所有行。

从文本文件中采样数据:

 {"offset":"14726172634","bids":[["871094.22000","0.00200000","0.00200000","0","1081537351","29194","5","14726172633","1"],["871076.11000","0.00808000","0.00808000","0","1081537130","623964","5","14726172043","1"],["871073.96500","0.00100000","0.00100000","0","1081537185","29194","5","14726172231","1"]]],
"asks":[["875644.72000","0.00200000","0.00200000","0","1081606189","29194","5","14726356256","1"],["875669.77637","0.01000000","0.01000000","0","1081606227","29194","5","14726356379","1"],["875678.92000","0.00600000","0.00600000","0","1081606263","29194","5","14726356488","1"],["875731.74364","0.03000000","0.03000000","0","1081606233","29194","5","14726356393","1"],

代码示例:

 data = pd.read_csv('20190523-012523_product_5_snapshot_14726172634_14728561053.txt', lineterminator= str(']'), low_memory= False, error_bad_lines=False, header= None)#, names= ['a','d','f','r','y','h','n','m','k'])

 new = data[1].str.split("[", n = 1, expand = True)
 data[1]= new[0]
 data[10]= new[1]
 data.drop(data.index[-1], inplace=True)
 data[10]= new[1].str.strip('[').str.strip('"')

 data = data.set_index([1,2])
 data = data.loc[:,[10]]
 data = data.loc['bids:']

数据示例:

bids:  0.002000   871094.22000
       0.008080   871076.11000
       0.001000   871073.96500
bids:  0.005000   871042.87000
       0.005000   871038.55000
       0.001000   871032.90156

代码输出:

bids:  0.002000   871094.22000
bids:  0.005000   871042.87000

请问如何获得6行?目的是在其他索引标签之间进行过滤。

索引输出为:

Index(['bids:', '', '', '', '', '', '', '', '', '',
       ...
       'asks:', '', '', '', '', '', '', '', '', '',
       ...
       'bids:', '', '', '', '', '', '', '', '', '',
       ...'],
      dtype='object', name=1, length=505148)

1 个答案:

答案 0 :(得分:0)

我相信您需要:

print (data)
                           10
1    2                       
bids 0.00200000  871094.22000
     0.00808000  871076.11000
     0.00100000  871073.96500
asks 0.00200000  875644.72000
     0.01000000  875669.77637
     0.00600000  875678.92000
     0.03000000  875731.74364
bids 0.00200000  871094.22000
     0.00808000  871076.11000
     0.00100000  871073.96500

print (data.index)
MultiIndex(levels=[['asks', 'bids'], 
                   ['0.00100000', '0.00200000', '0.00600000', 
                    '0.00808000', '0.01000000', '0.03000000']],
           codes=[[1, 1, 1, 0, 0, 0, 0, 1, 1, 1], [1, 3, 0, 1, 4, 2, 5, 1, 3, 0]],
           names=[1, 2])

第一个MultiIndex级别的第一个值的解决方案,重复的值:

s = data.index.get_level_values(0).to_series()
mask = s.ne(s.shift())
print (mask)
1
bids     True
bids    False
bids    False
asks     True
asks    False
asks    False
asks    False
bids     True
bids    False
bids    False
Name: 1, dtype: bool

df = data[mask.values]
print (df)
                           10
1    2                       
bids 0.00200000  871094.22000
asks 0.00200000  875644.72000
bids 0.00200000  871094.22000

df = df.xs('bids', drop_level=False)
print (df)
                           10
1    2                       
bids 0.00200000  871094.22000
     0.00200000  871094.22000

如果没有MultiIndex

print (data)
              2             10
1                             
bids  0.00200000  871094.22000
bids  0.00808000  871076.11000
bids  0.00100000  871073.96500
asks  0.00200000  875644.72000
asks  0.01000000  875669.77637
asks  0.00600000  875678.92000
asks  0.03000000  875731.74364
bids  0.00200000  871094.22000
bids  0.00808000  871076.11000
bids  0.00100000  871073.96500

print (data.index)
Index(['bids', 'bids', 'bids', 'asks', 'asks', 'asks', 'asks', 'bids', 'bids',
       'bids'],
      dtype='object', name=1)

s = data.index.to_series()
mask = s.ne(s.shift())
print (mask)
1
bids     True
bids    False
bids    False
asks     True
asks    False
asks    False
asks    False
bids     True
bids    False
bids    False
Name: 1, dtype: bool

df = data[mask.values].loc['bids']
print (df)
              2             10
1                             
bids  0.00200000  871094.22000
bids  0.00200000  871094.22000