Question

我想知道如何计算这些列的百分比，并将其保存在其旁边的新列中N次。例子

d1 = [['0.00', '10','11','15'], ['2.99', '30','40','0'], ['4.99', '5','0','2']]

df1 = pd.DataFrame(d1, columns = ['Price', '1','2','3'])

我希望以下操作遍历所有列（当然价格除外）

df1['1%'] = df1['1'] / df1['1'].sum() (I got an error when I tried this)

结果：

d2 = [['0.00', '10','0.22','11','0.2156','15','0.8823'], ['2.99', '30','0.66','40','0.7843','0','0'], ['4.99', '5','0.11','0','0','2','0.1176']]

df2 = pd.DataFrame(d2, columns = ['Price', '1','1%','2','2%','3','3%'])

（列可以是N次，所以我需要遍历所有列）

Answer 1

为了获得输出，您需要使用pd.to_numeric

将字符串转换为数字

pd.concat([df1, df1.drop('Price',1).apply(lambda x: pd.to_numeric(x).div(pd.to_numeric(x).sum()))
               .rename(columns=lambda x: x+'%')], 1)

输出：

    Price   1   2   3   1%                2%          3%
0   0.00    10  11  15  0.222222    0.215686    0.882353
1   2.99    30  40  0   0.666667    0.784314    0.000000
2   4.99    5   0   2   0.111111    0.000000    0.117647

Answer 2

IIUC，您需要：

m=df1.set_index('Price').div(df1.set_index('Price').sum()).add_suffix('%')
df2=pd.concat([df1.set_index('Price'),m],axis=1).sort_index(axis=1).reset_index()

   Price   1        1%   2        2%   3        3%
0   0.00  10  0.222222  11  0.215686  15  0.882353
1   2.99  30  0.666667  40  0.784314   0  0.000000
2   4.99   5  0.111111   0  0.000000   2  0.117647

注意：这是假设dtypes为：

df1.dtypes
Price    float64
1          int32
2          int32
3          int32

Answer 3

a=df1.columns[1:]
df1[a+'%'] = df1[a].astype(float) / df1[a].astype(float).sum()

输出

Price   1   2   3   1%           2%         3%
0.00    10  11  15  0.222222    0.215686    0.882353
2.99    30  40  0   0.666667    0.784314    0.000000
4.99    5   0   2   0.111111    0.000000    0.117647

Answer 4

让我们将您的问题分为两部分：

1）为什么在尝试计算每一列的百分比时会出错：

基本上，您的列是字符串类型。您可以将列转换为浮点类型，也可以在定义数据框时更改类型：

更改列的类型：df1['1%'] = df1['1%].astype(float)
在定义数据框时更改类型：

d1 = [[0.00, 10, 11, 15], [ 2.99, 30, 40, 0], [ 4.99, 5, 0, 2]]

2）遍历所有列的公式：

以下代码迭代您的公式并在原始数据框中创建另一列：

for column in df1.drop(['Price'], axis=1).columns:
    df1[column + '%'] = df1[column] / df1[column].sum()

创建新列以计算重复N次的百分比

4 个答案: