熊猫一步一步地总结

时间:2018-06-18 14:05:38

标签: pandas

我有下表:

           A  A_pct     B   B_pct

Player1  1.0   12.5  15.0   18.75
Player2  7.0   87.5  65.0   81.25
Total    8.0  100.0  80.0  100.00

我正在尝试在末尾添加一列,其中所有列的总和具有 _pct 后缀。

我可以在非pct数据帧上使用sum来添加列,但最后我以NaN值结束:

           A  A_pct     B   B_pct  Total

Player1  1.0   12.5  15.0   18.75   16.0
Player2  7.0   87.5  65.0   81.25   72.0
Total    8.0  100.0  80.0  100.00    NaN

我可以使用df.['Total'].fillna(100, inplace=True)修复,但这看起来很麻烦......

是否存在逐步求和的选项?像sum([i for i in df.columns[::2]]

这样的东西

1 个答案:

答案 0 :(得分:2)

这将选择没有'_pct'且按行加总的所有列

df['Total'] = df[df.columns[~df.columns.str.contains('_pct')]].sum(axis=1)

df
Out[]:
           A  A_pct     B   B_pct  Total
Player1  1.0   12.5  15.0   18.75   16.0
Player2  7.0   87.5  65.0   81.25   72.0
Total    8.0  100.0  80.0  100.00   88.0

一步一步

# Get the names of all columns withoput the '_pct' string
columns_names_without_pct = df.columns[~df.columns.str.contains('_pct')]

# Select only the part of the dataframe that contains these columns
df_without_pct = df[columns_names_without_pct]

# Sum along axis 1, the horizontal axis
df_without_pct.sum(axis=1)

# Set this to a new column called  'Total'
df['Total'] = df_without_pct.sum(axis=1)