Question

这与我昨天问的问题非常相似。目的是能够添加一种功能，该功能将允许根据另一个中显示的值来创建列。例如，当它在指定的文件中找到国家/地区代码时，我希望它创建一个名称为“ 国家/地区代码总计”的列，并对具有相同列的每一行的单位数量求和国家代码

这是我的脚本当前输出：

我想看的东西

我的脚本：

df['Sum of Revenue'] = df['Units Sold'] * df['Dealer Price']
    df['AR Revenue'] = df[]
    df = df.sort_values(['End Consumer Country', 'Currency Code'])
    # Sets first value of index by position
    df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
    # Sets first value of index by position
    df.loc[df.index[0], 'Total Revenue'] = df['Sum of Revenue'].sum()
    # Sums the amout of Units with the End Consumer Country AR
    df['AR Total'] = df.loc[df['End Consumer Country'] == 'AR', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country AU
    df['AU Total'] = df.loc[df['End Consumer Country'] == 'AU', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country NZ
    df['NZ Total'] = df.loc[df['End Consumer Country'] == 'NZ', 'Units Sold'].sum()

但是，据我所知该文件中将出现的国家/地区，我已将它们相应地添加到要查找的脚本中。我将如何编写脚本，以便如果找到另一个国家代码（例如GB），它将创建一个名为“ GB Total”的列，并对国家代码设置为GB的每一行的单位求和。

任何帮助将不胜感激！

Answer 1

如果您确实需要这种格式，那么我将按照以下步骤进行操作（从下面开始数据）：

# Get those first two columns
d = {'Sum of Revenue': 'Total Revenue', 'Units Sold': 'Total Sold'}
for col, newcol in d.items():
    df.loc[df.index[0], newcol] = df[col].sum()

# Add the rest for every country:
s = df.groupby('End Consumer Country')['Units Sold'].sum().to_frame().T.add_suffix(' Total')
s.index = [df.index[0]]

df  = pd.concat([df, s], 1, sort=False)

输出：`df`：

  End Consumer Country  Sum of Revenue  Units Sold  Total Revenue  Total Sold  AR Total  AU Total  NZ Total  US Total
a                   AR       13.486216           1     124.007334        28.0       3.0       7.0      11.0       7.0
b                   AR       25.984073           2            NaN         NaN       NaN       NaN       NaN       NaN
c                   AU       21.697871           3            NaN         NaN       NaN       NaN       NaN       NaN
d                   AU       10.962232           4            NaN         NaN       NaN       NaN       NaN       NaN
e                   NZ       16.528398           5            NaN         NaN       NaN       NaN       NaN       NaN
f                   NZ       29.908619           6            NaN         NaN       NaN       NaN       NaN       NaN
g                   US        5.439925           7            NaN         NaN       NaN       NaN       NaN       NaN

如您所见，pandas添加了一堆NaN值，因为我们只为第一行分配了内容，而DataFrame必须是矩形的

使用不同的DataFrame来汇总总数以及每个国家/地区内的内容要简单得多。如果可以的话，那么一切都简化为一个.pivot_table

df.pivot_table(index='End Consumer Country', 
               values=['Sum of Revenue', 'Units Sold'],
               margins=True,
               aggfunc='sum').T.add_suffix(' Total)

输出：

End Consumer Country   AR Total   AU Total   NZ Total  US Total   All Total
Sum of Revenue        39.470289  32.660103  46.437018  5.439925  124.007334
Units Sold             3.000000   7.000000  11.000000  7.000000   28.000000

相同的信息，更容易编写代码。

样本数据：

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'End Consumer Country': ['AR', 'AR', 'AU', 'AU', 'NZ', 'NZ', 'US'],
                   'Sum of Revenue': np.random.normal(20,6,7),
                   'Units Sold': np.arange(1,8,1)},
                   index = list('abcdefg'))

  End Consumer Country  Sum of Revenue  Units Sold
a                   AR       13.486216           1
b                   AR       25.984073           2
c                   AU       21.697871           3
d                   AU       10.962232           4
e                   NZ       16.528398           5
f                   NZ       29.908619           6
g                   US        5.439925           7

根据另一个中的值自行创建列

1 个答案:

输出：`df`：

输出：

样本数据：

根据另一个中的值自行创建列

1 个答案:

输出：df：

输出：

样本数据：

输出：`df`：