我有一个如下所示的数据框:
+---+-------------+-------------+-------------+-------------+------------+------------+------------+------------+
| | cat1 - Sept | cat2 - Sept | cat3 - Sept | cat4 - Sept | cat1 - Aug | cat2 - Aug | cat3 - Aug | cat4 - Aug |
+---+-------------+-------------+-------------+-------------+------------+------------+------------+------------+
| 0 | 49 | 6 | 35 | 24 | 15 | 10 | 45 | 37 |
| 1 | 4 | 2 | 32 | 20 | 21 | 26 | 43 | 41 |
| 2 | 3 | 42 | 22 | 8 | 15 | 17 | 45 | 3 |
| 3 | 3 | 5 | 32 | 14 | 28 | 11 | 45 | 3 |
| 4 | 4 | 22 | 9 | 50 | 1 | 8 | 16 | 23 |
| 5 | 10 | 15 | 9 | 41 | 3 | 35 | 30 | 34 |
| 6 | 21 | 4 | 12 | 44 | 43 | 32 | 12 | 10 |
| 7 | 4 | 49 | 42 | 30 | 11 | 25 | 27 | 24 |
| 8 | 46 | 18 | 46 | 29 | 36 | 5 | 46 | 23 |
+---+-------------+-------------+-------------+-------------+------------+------------+------------+------------+
实际上,每个月都有15个类别。我想做的是将数据框转换为:
+---+-------------+----------+-------------+----------+-------------+----------+-------------+----------+
| | cat1 - Sept | % Change | cat2 - Sept | % Change | cat3 - Sept | % Change | cat4 - Sept | % Change |
+---+-------------+----------+-------------+----------+-------------+----------+-------------+----------+
| 0 | 49 | 227% | 6 | -40% | 35 | -22% | 24 | -35% |
| 1 | 4 | -81% | 2 | -92% | 32 | -26% | 20 | -51% |
| 2 | 3 | -80% | 42 | 147% | 22 | -51% | 8 | 167% |
| 3 | 3 | -89% | 5 | -55% | 32 | -29% | 14 | 367% |
| 4 | 4 | 300% | 22 | 175% | 9 | -44% | 50 | 117% |
| 5 | 10 | 233% | 15 | -57% | 9 | -70% | 41 | 21% |
| 6 | 21 | -51% | 4 | -88% | 12 | 0% | 44 | 340% |
| 7 | 4 | -64% | 49 | 96% | 42 | 56% | 30 | 25% |
| 8 | 46 | 28% | 18 | 260% | 46 | 0% | 29 | 26% |
+---+-------------+----------+-------------+----------+-------------+----------+-------------+----------+
这很容易做到,但它需要很多代码并且非常手动:
我正在为列组织寻找特定的pandas函数或习惯用法来减少代码并提高效率。
答案 0 :(得分:1)
以下一种方法是使用MultiIndex
列。
In [102]: idx = pd.IndexSlice
In [222]: df.columns = pd.MultiIndex.from_tuples([(b,a) for (a,b) in df.columns.str.split(' - ')])
In [223]: df = df.sortlevel(level=(0,1), axis=1)
In [224]: new_cols = [('% Change', cat) for cat in df.columns.levels[1]]
In [225]: df[new_cols] = df['Sept'] / df['Aug'] - 1
In [226]: df = df.loc[:, idx[['Sept', '% Change'], :]]
In [227]: df.columns = df.columns.swaplevel(0,1)
In [228]: df = df.sortlevel(level=(0,1), axis=1)
In [229]: df
Out[229]:
cat1 cat2 cat3 cat4
Sept % Change Sept % Change Sept % Change Sept % Change
0 49 2.266667 6 -0.400000 35 -0.222222 24 -0.351351
1 4 -0.809524 2 -0.923077 32 -0.255814 20 -0.512195
2 3 -0.800000 42 1.470588 22 -0.511111 8 1.666667
3 3 -0.892857 5 -0.545455 32 -0.288889 14 3.666667
4 4 3.000000 22 1.750000 9 -0.437500 50 1.173913
5 10 2.333333 15 -0.571429 9 -0.700000 41 0.205882
6 21 -0.511628 4 -0.875000 12 0.000000 44 3.400000
7 4 -0.636364 49 0.960000 42 0.555556 30 0.250000
8 46 0.277778 18 2.600000 46 0.000000 29 0.260870