我想在数据框中插入2列。
原始数据框
card auth month order_number
Amex A 2017-11 1234
Visa A 2017-12 2345
Amex D 2017-12 3456
我想按月分解auth_status。我使用了以下代码:
bin_month_df = monthly_df.pivot_table(index='card', columns=['month', 'auth'],values='order_number', aggfunc='count')
按月划分的数据框
month 2017-11 2017-12
auth A D A D
card
mastercard 10 11 11 10
amex 19 20 10 11
visa 50 30 50 1
目标结果
我想为subtotal
和auth_rate
month 2017-11 2017-12
auth A D total pct A D total pct
card
mastercard 10 11 21 .47 11 10 21 .52
amex 19 20 39 .49 10 11 21 .47
visa 50 30 80 .63 50 1 51 .98
我在创建这些列时遇到问题。This链接按行显示小计,但它不会将我转换为列或计算列。
感谢任何帮助!
答案 0 :(得分:1)
使用:
#create sum by first level of MultiIndex
df1 = df.sum(axis=1, level=0)
df1.columns = [df1.columns, ['total'] * len(df1.columns)]
print (df1)
month 2017-11 2017-12
total total
card
mastercard 21 21
amex 39 21
visa 80 51
#select by second level and divide
df2 = df.xs('A', axis=1, level=1).div(df1.xs('total', axis=1, level=1)).round(2)
df2.columns = [df2.columns, ['pct'] * len(df2.columns)]
print (df2)
month 2017-11 2017-12
pct pct
card
mastercard 0.48 0.52
amex 0.49 0.48
visa 0.62 0.98
#join all together, sort MultiIndex
df3 = pd.concat([df, df1, df2], axis=1).sort_index(axis=1)
print (df3)
month 2017-11 2017-12
auth A D pct total A D pct total
card
mastercard 10 11 0.48 21 11 10 0.52 21
amex 19 20 0.49 39 10 11 0.48 21
visa 50 30 0.62 80 50 1 0.98 51
#for custom order reindex by custom MultiIndex
c = df.columns.levels[1].tolist() + ['total', 'pct']
mux = pd.MultiIndex.from_product([df.columns.levels[0], c], names=df.columns.names)
df4 = df3.reindex(columns=mux)
print(df4)
month 2017-11 2017-12
auth A D total pct A D total pct
card
mastercard 10 11 21 0.48 11 10 21 0.52
amex 19 20 39 0.49 10 11 21 0.48
visa 50 30 80 0.62 50 1 51 0.98
答案 1 :(得分:0)
刚刚在 Pandas 0.17.0 和 Python 2.7.5 上测试过,我现在可以理解为什么你问我reindex(轴= 1)和'*的问题'df1.columns.levels[1]
之前。这确实是Pandas和Python的版本问题。我修改了代码以运行上面提到的旧版本,并修复了一个潜在的错误,以防多个常见的描述性统计需要在同一个数据透视表中进行后期计算。展望未来,在未来的帖子中提及软件版本(如果它们是旧版本)会更容易,因此会产生更少的误解:
import pandas as pd
str = """card auth month order_number
Amex A 2017-11 1234
Visa A 2017-12 2345
Amex D 2017-12 3416
MC A 2017-12 3426
Visa A 2017-11 3436
Amex D 2017-12 3446
Visa A 2017-11 3466
Amex D 2017-12 3476
Visa D 2017-11 3486
"""
# create dataframe from the above sample data
df = pd.read_table(pd.io.common.StringIO(str), sep='\s+')
# create the pivot_table using the method OP supplied
df1 = df.pivot_table(index='card', columns=['month', 'auth'], values='order_number', aggfunc='count')
print(df1)
# month 2017-11 2017-12
# auth A D A D
# card
# Amex 1.0 NaN NaN 3.0
# MC NaN NaN 1.0 NaN
# Visa 2.0 1.0 1.0 NaN
# create an empty dataframe with the same index/column layout as df1
# except the level-1 in columns
idx = pd.MultiIndex.from_product([df1.columns.levels[0], ['total', 'avg', 'std', 'pct']], names=df1.columns.names)
df2 = pd.DataFrame(columns=idx, index=df1.index).sort_index(axis=1)
print(df2)
# month 2017-11 2017-12
# auth avg pct std total avg pct std total
# card
# Amex NaN NaN NaN NaN NaN NaN NaN NaN
# MC NaN NaN NaN NaN NaN NaN NaN NaN
# Visa NaN NaN NaN NaN NaN NaN NaN NaN
# Calculate the common stats:
df2.loc[:,(slice(None),'total')] = df1.groupby(level=0, axis=1).sum().values
df2.loc[:,(slice(None),'avg')] = df1.groupby(level=0, axis=1).mean().values
df2.loc[:,(slice(None),'std')] = df1.groupby(level=0, axis=1).std().values
# join df2 with df1 and assign the result to df3 (can also overwrite df1):
df3 = df1.join(df2).sort_index(axis=1)
# calculate `pct` which needs both a calculated field and an original field
# auth-rate = A / total
df3.loc[:,(slice(None),'pct')] = df3.groupby(level=0, axis=1)\
.apply(lambda x: x.loc[:,(slice(None),'A')].values/x.loc[:,(slice(None),'total')].values) \
.values
print(df3)
# month 2017-11 2017-12
# auth A D avg pct std total A D avg pct std total
# card
# Amex 1 NaN 1.0 1.000000 NaN 1 NaN 3 3 NaN NaN 3
# MC NaN NaN NaN NaN NaN NaN 1 NaN 1 1 NaN 1
# Visa 2 1 1.5 0.666667 0.707107 3 1 NaN 1 1 NaN 1
# rounding if needed:
df3.loc[:,(slice(None),'pct')] = df3.loc[:,(slice(None),'pct')].round(decimals=2)
如果要按特定顺序对level-1列进行排序,可以执行reindex()。
# create a ordered list of level-1 on columns
column_level_1 = list(df1.columns.levels[1]) + ['total', 'avg', 'std', 'pct']
# create MultiIndex for columns and reindex_axis accordingly
midx = pd.MultiIndex.from_product([df1.columns.levels[0], column_level_1], names=df1.columns.names)
df3 = df3.reindex_axis(midx, axis=1)
print(df3)
# month 2017-11 2017-12
# auth A D total avg std pct A D total avg std pct
# card
# Amex 1 NaN 1 1.0 NaN 1.000000 NaN 3 3 3 NaN NaN
# MC NaN NaN NaN NaN NaN NaN 1 NaN 1 1 NaN 1
# Visa 2 1 3 1.5 0.707107 0.666667 1 NaN 1 1 NaN 1