我有一个df,它包含所有数字列。我想为每列查找cumprod
并将每列结果的结果并排附加。这该怎么做。我希望此并列结果可以方便我进行比较。
例如:
我的输入df:
col1 col2 col3
0 1.000000 1.000000 1.000000
1 0.998766 0.999490 0.998892
2 0.997779 0.999081 0.998005
3 0.996299 0.998469 0.996676
4 0.994573 0.997754 0.995126
5 0.993095 0.997140 0.993797
6 0.991125 0.996322 0.992027
7 0.989648 0.995708 0.990699
8 0.988171 0.995094 0.989372
9 0.986695 0.994480 0.988045
10 0.984729 0.993660 0.986276
11 0.983010 0.992943 0.984730
df的总和:
col1 col2 col3
0 1.000000 1.000000 1.000000
1 0.998766 0.999490 0.998892
2 0.996547 0.998572 0.996899
3 0.992859 0.997043 0.993585
4 0.987471 0.994803 0.988742
5 0.980653 0.991958 0.982609
6 0.971949 0.988310 0.974775
7 0.961887 0.984069 0.965708
8 0.950509 0.979241 0.955444
9 0.937863 0.973836 0.944022
10 0.923541 0.967662 0.931066
11 0.907850 0.960833 0.916849
预期输出:
col1 col1 col2 col2 col3 col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996547 0.999081 0.998572 0.998005 0.996899
3 0.996299 0.992859 0.998469 0.997043 0.996676 0.993585
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988742
5 0.993095 0.980653 0.997140 0.991958 0.993797 0.982609
6 0.991125 0.971949 0.996322 0.988310 0.992027 0.974775
7 0.989648 0.961887 0.995708 0.984069 0.990699 0.965708
8 0.988171 0.950509 0.995094 0.979241 0.989372 0.955444
9 0.986695 0.937863 0.994480 0.973836 0.988045 0.944022
10 0.984729 0.923541 0.993660 0.967662 0.986276 0.931066
11 0.983010 0.907850 0.992943 0.960833 0.984730 0.916849
注意:如果我更喜欢列名使用cum_of_coln
而不是coln
获取我使用的cum_prod的代码,
print df
print df.cumprod()
答案 0 :(得分:2)
计算cumprod
,然后使用cytoolz
并插入列标题:
from toolz import interleave
df2 = df.cumprod().add_prefix('cum_of_')
df3 = pd.concat([df, df2], axis=1)[list(interleave([df, df2]))]
或者,您可以使用sorted
:
df2 = df.cumprod().add_prefix('cum_of_')
df3 = pd.concat([df, df2], axis=1)
df3 = df3[sorted(df3, key=lambda x: x.split('_')[-1])]
第三个选项是排序后对列标题进行突变。应该非常有效。
df3 = pd.concat([df, df.cumprod()], axis=1).sort_index(axis=1)
c = df3.columns.values
c[1::2] = 'cum_of_' + c[1::2]
df3.columns = c
df3.head()
col1 cum_of_col1 col2 cum_of_col2 col3 cum_of_col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996548 0.999081 0.998571 0.998005 0.996899
3 0.996299 0.992860 0.998469 0.997043 0.996676 0.993586
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988743
答案 1 :(得分:1)
使用concat
并按列表理解生成的列表重新排序:
cols = [item for x in df.columns for item in (x, 'cum_of_' + x)]
df = pd.concat([df, df.cumprod().add_prefix('cum_of_')], axis=1)[cols]
print (df)
col1 cum_of_col1 col2 cum_of_col2 col3 cum_of_col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996548 0.999081 0.998571 0.998005 0.996899
3 0.996299 0.992860 0.998469 0.997043 0.996676 0.993586
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988743
5 0.993095 0.980653 0.997140 0.991958 0.993797 0.982610
6 0.991125 0.971949 0.996322 0.988310 0.992027 0.974775
7 0.989648 0.961888 0.995708 0.984068 0.990699 0.965709
8 0.988171 0.950510 0.995094 0.979240 0.989372 0.955445
9 0.986695 0.937863 0.994480 0.973835 0.988045 0.944023
10 0.984729 0.923541 0.993660 0.967661 0.986276 0.931067
11 0.983010 0.907850 0.992943 0.960832 0.984730 0.916850
答案 2 :(得分:1)
直接在列上附加pd.assign
:
df.assign(**df.cumprod().add_prefix('cumprod_'))
col1 col2 col3 cumprod_col1 cumprod_col2 cumprod_col3
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.999490 0.998892 0.998766 0.999490 0.998892
2 0.997779 0.999081 0.998005 0.996548 0.998571 0.996899
3 0.996299 0.998469 0.996676 0.992860 0.997043 0.993586
4 0.994573 0.997754 0.995126 0.987471 0.994803 0.988743
5 0.993095 0.997140 0.993797 0.980653 0.991958 0.982610
6 0.991125 0.996322 0.992027 0.971949 0.988310 0.974775
7 0.989648 0.995708 0.990699 0.961888 0.984068 0.965709
8 0.988171 0.995094 0.989372 0.950510 0.979240 0.955445
9 0.986695 0.994480 0.988045 0.937863 0.973835 0.944023
10 0.984729 0.993660 0.986276 0.923541 0.967661 0.931067
11 0.983010 0.992943 0.984730 0.907850 0.960832 0.916850
如果您希望将列按col1 - cumprod_col1...
进行排序,则可以使用reindex_axis
按字母顺序对列进行排序,在这种情况下,请添加后缀add_suffix
df = df.assign(**df.cumprod().add_suffix('_cumprod'))
df = df.reindex_axis(sorted(df.columns), axis=1)
col1 col1_cumprod col2 col2_cumprod col3 col3_cumprod
0 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 0.998766 0.998766 0.999490 0.999490 0.998892 0.998892
2 0.997779 0.996548 0.999081 0.998571 0.998005 0.996899
3 0.996299 0.992860 0.998469 0.997043 0.996676 0.993586
4 0.994573 0.987471 0.997754 0.994803 0.995126 0.988743
5 0.993095 0.980653 0.997140 0.991958 0.993797 0.982610
6 0.991125 0.971949 0.996322 0.988310 0.992027 0.974775
7 0.989648 0.961888 0.995708 0.984068 0.990699 0.965709
8 0.988171 0.950510 0.995094 0.979240 0.989372 0.955445
9 0.986695 0.937863 0.994480 0.973835 0.988045 0.944023
10 0.984729 0.923541 0.993660 0.967661 0.986276 0.931067
11 0.983010 0.907850 0.992943 0.960832 0.984730 0.916850