从pandas中的MultiIndex DataFrame中提取和绘制数据

时间:2014-10-14 14:59:26

标签: python pandas multi-index

我设法将下表放入pandas DataFrame中。它具有多维索引(file_type,server_count,file_count,thread_count,cacheclear_type),表示某些性能度量的配置。然后我为每个配置运行了5次。

+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
|           |              |            |              |                 | run_001 | run_002 | run_003 | run_004 | run_005 |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| file_type | server_count | file_count | thread_count | cacheclear_type |         |         |         |         |         |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| gor       | 01servers    | 05files    | 20threads    | ccALWAYS        | 15.918  | 16.275  | 15.807  | 17.781  | 16.233  |
|           | 08servers    | 05files    | 20threads    | ccALWAYS        | 17.061  | 15.414  | 16.819  | 15.597  | 16.818  |
| gorz      | 01servers    | 05files    | 20threads    | ccALWAYS        | 12.285  | 11.218  | 12.009  | 14.122  | 10.991  |
|           | 08servers    | 05files    | 20threads    | ccALWAYS        | 9.881   | 9.405   | 9.322   | 10.184  | 9.924   |
| gor       | 01servers    | 10files    | 20threads    | ccALWAYS        | 17.322  | 17.636  | 16.096  | 16.484  | 16.715  |
|           | 08servers    | 10files    | 20threads    | ccALWAYS        | 17.167  | 17.666  | 15.950  | 18.867  | 16.569  |
| gorz      | 01servers    | 10files    | 20threads    | ccALWAYS        | 14.718  | 19.553  | 17.930  | 21.415  | 21.495  |
|           | 08servers    | 10files    | 20threads    | ccALWAYS        | 10.236  | 9.948   | 12.605  | 9.780   | 10.320  |
| gor       | 01servers    | 15files    | 20threads    | ccALWAYS        | 19.265  | 17.128  | 17.630  | 18.739  | 16.833  |
|           | 08servers    | 15files    | 20threads    | ccALWAYS        | 23.083  | 22.084  | 25.024  | 24.677  | 20.648  |
| gorz      | 01servers    | 15files    | 20threads    | ccALWAYS        | 15.401  | 28.282  | 28.727  | 24.645  | 27.509  |
|           | 08servers    | 15files    | 20threads    | ccALWAYS        | 10.307  | 12.217  | 13.005  | 12.277  | 12.224  |
| gor       | 01servers    | 20files    | 20threads    | ccALWAYS        | 23.744  | 20.539  | 21.416  | 22.921  | 22.794  |
|           | 08servers    | 20files    | 20threads    | ccALWAYS        | 35.393  | 36.218  | 35.949  | 35.157  | 37.342  |
| gorz      | 01servers    | 20files    | 20threads    | ccALWAYS        | 19.505  | 23.756  | 25.767  | 26.575  | 25.239  |
|           | 08servers    | 20files    | 20threads    | ccALWAYS        | 11.398  | 11.332  | 15.086  | 16.115  | 13.479  |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+

我想采用所有gor,1servers,20threads,ccALWAYS配置,并为每个XXfiles配置创建一个数据点。首先,我想以某种方式获得一个如下所示的DataFrame:

+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
|           |              |            |              |                 | run_001 | run_002 | run_003 | run_004 | run_005 |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| file_type | server_count | file_count | thread_count | cacheclear_type |         |         |         |         |         |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+
| gor       | 01servers    | 05files    | 20threads    | ccALWAYS        | 15.918  | 16.275  | 15.807  | 17.781  | 16.233  |
| gor       | 01servers    | 10files    | 20threads    | ccALWAYS        | 17.322  | 17.636  | 16.096  | 16.484  | 16.715  |
| gor       | 01servers    | 15files    | 20threads    | ccALWAYS        | 19.265  | 17.128  | 17.630  | 18.739  | 16.833  |
| gor       | 01servers    | 20files    | 20threads    | ccALWAYS        | 23.744  | 20.539  | 21.416  | 22.921  | 22.794  |
+-----------+--------------+------------+--------------+-----------------+---------+---------+---------+---------+---------+

我该怎么做?

1 个答案:

答案 0 :(得分:0)

我设法使用query()函数过滤数据,使用以下代码使其看起来像问题中的第二个表:

df.query('file_type == "gor" & server_count == "01servers"').sortlevel(2)