我有一个包含多索引的数据框。我需要根据模式和/或脚本(索引为schema
和script
)处理各种数据子集。数据框如下所示:
tx_id step step_id start_time
schema_10 cmc_v2_file 19-3 10 279 2015-09-04 00:46:30
cmc_v2_file 2-7 10 423 2015-09-04 00:46:22
cmc_v2_file 29-1 10 20 2015-09-04 00:46:34
cmc_v2_file 35-1 4 63 2015-09-04 00:46:51
cmc_v2_file 31-2 10 79 2015-09-04 00:46:54
cmc_v2_file 5-8 10 536 2015-09-04 00:46:57
cmc_v2_file 5-9 10 610 2015-09-04 00:47:13
cmc_v2_file 39-1 10 178 2015-09-04 00:47:12
cmc_v2_file 41-1 10 211 2015-09-04 00:47:22
cmc_v2_file 21-4 10 678 2015-09-04 00:47:28
cmc_v2_file 23-4 10 698 2015-09-04 00:47:31
cmc_v2_file 31-5 10 399 2015-09-04 00:47:45
cmc_v2_file 35-4 3 453 2015-09-04 00:47:54
cmc_v2_file 29-5 4 461 2015-09-04 00:47:54
cmc_v2_file 29-5 8 465 2015-09-04 00:47:55
cmc_v2_file 42-3 1 467 2015-09-04 00:47:57
cmc_v2_file 22-5 8 866 2015-09-04 00:47:53
cmc_v2_file 16-6 8 893 2015-09-04 00:47:51
cmc_v2_file 17-6 4 938 2015-09-04 00:47:54
cmc_v2_file 17-6 8 942 2015-09-04 00:47:55
cmc_v2_file 6-2 10 707 2015-09-04 00:47:50
cmc_v2_file 4-11 10 730 2015-09-04 00:47:54
cmc_v2_file 6-3 2 745 2015-09-04 00:47:53
cmc_v2_file 5-11 1 762 2015-09-04 00:47:55
cmc_v2_file 4-12 1 763 2015-09-04 00:47:56
cmc_v2_file 5-12 10 782 2015-09-04 00:48:16
cmc_v2_file 31-6 4 471 2015-09-04 00:47:55
cmc_v2_file 38-3 4 520 2015-09-04 00:47:51
cmc_v2_file 39-3 4 551 2015-09-04 00:47:55
cmc_v2_file 31-7 10 570 2015-09-04 00:48:20
... ... ... ... ...
schema_9 hcs-vbu 1332-132 14 197542 2015-09-04 00:29:46
hcs-vbu 515-143 5 196309 2015-09-04 00:29:01
hcs-vbu 552-126 13 196333 2015-09-04 00:29:19
hcs-vbu 559-116 12 197068 2015-09-04 00:29:33
hcs-vbu 566-115 13 197201 2015-09-04 00:29:47
hcs-vbu 523-152 3 197443 2015-09-04 00:29:33
hcs-vbu 790-136 2 200774 2015-09-04 00:28:46
hcs-vbu 790-136 4 200776 2015-09-04 00:28:56
hcs-vbu 790-136 12 200784 2015-09-04 00:29:13
hcs-vbu 206-148 5 198213 2015-09-04 00:29:04
为了获取特定脚本的数据,我这样做:
df.loc(axis=0)[:,[script]]
当我打印出整个数据帧时,它看起来是正确的。问题是我也在为所有这些编写单元测试,对于部分测试,我想验证数据只包含一个脚本:
scripts = df.index.levels[df.index.names.index('script')]
但是,不是像我预期的那样返回一个列表,而是获得一个6的列表,它是原始未过滤数据中的脚本数。一旦通过调用.loc过滤数据框,我是否应该以不同的方式检索脚本索引?
答案 0 :(得分:0)
您的第二个语句df.index.levels
获取索引中的所有级别。然后你通过说,给我第二个多索引中的所有级别(称为“脚本”)来对它进行子集化。
我认为你想要的是这样的,你说,对于名为'script'的索引,给我一个特定的值。
## here we set a specific value you want to filter with
specific_script_value = cmc_v2_file
## and then we filter in the second dimension of the index.
## The indexer helps slice in several dimensions
idx=pd.IndexSlice
df.loc[idx[:,specific_script_value],:]