我正在尝试编写一个脚本,该脚本将使DataFrame
具有任意数量的实验条件(例如,3种不同浓度的药物)和每种条件的任意数量的重复(即试验) 1-3)看起来像这样:
100_uM_Drug_Trial_1 100_uM_Drug_Trial_2 10_uM_Drug_Trial_1 \
0 459.924747 635.685284 518.163653
1 459.458934 636.249568 518.445279
2 460.006374 636.435523 518.743388
3 460.002453 636.794022 518.895792
4 460.598404 636.103206 518.836557
5 460.309564 637.187444 518.976234
6 460.609499 636.335023 519.005662
7 460.843505 637.123839 519.041012
8 460.969187 637.047453 518.880728
9 460.832477 637.231533 519.108122
10 461.255201 638.176752 518.979086
11 461.310764 636.924448 518.979923
12 461.507783 637.824450 519.117064
13 461.116555 637.145600 519.106675
14 461.891845 638.136241 519.531348
15 461.746859 637.819223 519.161308
16 461.840650 637.977134 519.203945
17 462.028374 638.474671 519.184845
18 461.726244 638.039615 519.225926
19 462.128634 638.624309 519.177030
20 461.242868 637.636891 519.460114
21 462.201164 638.493620 519.469176
22 464.078771 637.749872 519.505141
23 464.605662 639.119425 519.654590
24 464.352002 638.789306 519.947157
25 464.485028 638.656634 519.822459
26 464.506035 639.428889 519.906759
27 464.834154 638.481042 520.143631
28 464.886412 639.267176 520.218972
29 465.414446 638.661687 520.384017
...并通过条件和试验对它进行多重索引,所以它看起来像这样:
Condition 100_uM_Drug 10_uM_Drug
Trial 1 2 1
0 459.924747 635.685284 518.163653
1 459.458934 636.249568 518.445279
2 460.006374 636.435523 518.743388
3 460.002453 636.794022 518.895792
4 460.598404 636.103206 518.836557
5 460.309564 637.187444 518.976234
6 460.609499 636.335023 519.005662
7 460.843505 637.123839 519.041012
8 460.969187 637.047453 518.880728
9 460.832477 637.231533 519.108122
10 461.255201 638.176752 518.979086
11 461.310764 636.924448 518.979923
12 461.507783 637.824450 519.117064
13 461.116555 637.145600 519.106675
14 461.891845 638.136241 519.531348
15 461.746859 637.819223 519.161308
16 461.840650 637.977134 519.203945
17 462.028374 638.474671 519.184845
18 461.726244 638.039615 519.225926
19 462.128634 638.624309 519.177030
20 461.242868 637.636891 519.460114
21 462.201164 638.493620 519.469176
22 464.078771 637.749872 519.505141
23 464.605662 639.119425 519.654590
24 464.352002 638.789306 519.947157
25 464.485028 638.656634 519.822459
26 464.506035 639.428889 519.906759
27 464.834154 638.481042 520.143631
28 464.886412 639.267176 520.218972
29 465.414446 638.661687 520.384017
我尝试了一些方法,包括用正则表达式过滤列名,但我还没有任何工作。有没有一种快速简便的方法可以做到这一点我错过了?
THX
答案 0 :(得分:1)
您可以在分割MultiIndex.from_tuples()
名称(see docs)时使用column
:
df.columns = pd.MultiIndex.from_tuples([('_'.join(col.split('_')[:3]), col.split('_')[-1]) for col in df.columns], names=['Drug', 'Trial'])
产生
Drug 100_uM_Drug 10_uM_Drug
Trial 1 2 1
0 0 459.924747 635.685284
1 1 459.458934 636.249568
2 2 460.006374 636.435523
3 3 460.002453 636.794022