我有一些看起来像这样的数据:
>>> print totals.sample(4)
start end \
time region_type
2016-01-24 02:17:10.238 STACK GUARD 79940452352 79940665344
2016-01-23 20:14:17.043 MALLOC metadata 64688259072 64688996352
2016-01-22 23:20:53.752 IOKit 47857778688 47861174272
2016-01-23 08:17:06.561 __DATA 3711964667904 3711979212800
vsize rsdnt dirty swap
time region_type
2016-01-24 02:17:10.238 STACK GUARD 212992 0 0 0
2016-01-23 20:14:17.043 MALLOC metadata 737280 81920 81920 8192
2016-01-22 23:20:53.752 IOKit 3395584 24576 24576 3371008
2016-01-23 08:17:06.561 __DATA 14544896 4907008 618496 4780032
我想知道dirty + swap大于1e7的任何行的region_type:
这很有效,但看起来很冗长:
>>> print totals[(totals.dirty + totals.swap) > 1e7].groupby(level='region_type').\
apply(lambda x: 'lol').index.tolist()
['MALLOC_NANO', 'MALLOC_SMALL']
有更好的方法吗?
我原以为这会起作用,但它会给出数据集中的所有region_types,而不是我选择的那些:
totals[(totals.dirty + totals.swap) > 1e7].index.levels[1].tolist()
答案 0 :(得分:2)
使用index.get_level_values
(返回使用的值),而不是index.levels
(返回索引知道的值):
mask = totals['dirty']+totals['swap'] > 1e7
result = mask.loc[mask]
region_types = result.index.get_level_values('region_type').unique()
例如,
In [243]: mask = totals['dirty']+totals['swap'] > 1e3; mask
Out[243]:
time region_type
2016-01-24 02:17:10.238 STACK GUARD False
2016-01-23 20:14:17.043 MALLOC metadata True
2016-01-22 23:20:53.752 IOKit True
2016-01-23 08:17:06.561 __DATA True
dtype: bool
In [244]: result = mask.loc[mask]; result
Out[244]:
time region_type
2016-01-23 20:14:17.043 MALLOC metadata True
2016-01-22 23:20:53.752 IOKit True
2016-01-23 08:17:06.561 __DATA True
dtype: bool
In [245]: result.index.get_level_values('region_type').unique()
Out[245]: array(['MALLOC metadata', 'IOKit', '__DATA'], dtype=object)