根据另一个中的值自行创建列

时间:2019-01-24 15:55:22

标签: python pandas

这与我昨天问的问题非常相似。目的是能够添加一种功能,该功能将允许根据另一个中显示的值来创建列。例如,当它在指定的文件中找到国家/地区代码时,我希望它创建一个名称为“ 国家/地区代码总计”的列,并对具有相同列的每一行的单位数量求和国家代码

这是我的脚本当前输出:

Script Output

我想看的东西

Goal

我的脚本:

df['Sum of Revenue'] = df['Units Sold'] * df['Dealer Price']
    df['AR Revenue'] = df[]
    df = df.sort_values(['End Consumer Country', 'Currency Code'])
    # Sets first value of index by position
    df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
    # Sets first value of index by position
    df.loc[df.index[0], 'Total Revenue'] = df['Sum of Revenue'].sum()
    # Sums the amout of Units with the End Consumer Country AR
    df['AR Total'] = df.loc[df['End Consumer Country'] == 'AR', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country AU
    df['AU Total'] = df.loc[df['End Consumer Country'] == 'AU', 'Units Sold'].sum()
    # Sums the amount of Units with the End Consumer Country NZ
    df['NZ Total'] = df.loc[df['End Consumer Country'] == 'NZ', 'Units Sold'].sum()

但是,据我所知该文件中将出现的国家/地区,我已将它们相应地添加到要查找的脚本中。我将如何编写脚本,以便如果找到另一个国家代码(例如GB),它将创建一个名为“ GB Total”的列,并对国家代码设置为GB的每一行的单位求和。

任何帮助将不胜感激!

1 个答案:

答案 0 :(得分:1)

如果您确实需要这种格式,那么我将按照以下步骤进行操作(从下面开始数据):

# Get those first two columns
d = {'Sum of Revenue': 'Total Revenue', 'Units Sold': 'Total Sold'}
for col, newcol in d.items():
    df.loc[df.index[0], newcol] = df[col].sum()

# Add the rest for every country:
s = df.groupby('End Consumer Country')['Units Sold'].sum().to_frame().T.add_suffix(' Total')
s.index = [df.index[0]]

df  = pd.concat([df, s], 1, sort=False)

输出:df

  End Consumer Country  Sum of Revenue  Units Sold  Total Revenue  Total Sold  AR Total  AU Total  NZ Total  US Total
a                   AR       13.486216           1     124.007334        28.0       3.0       7.0      11.0       7.0
b                   AR       25.984073           2            NaN         NaN       NaN       NaN       NaN       NaN
c                   AU       21.697871           3            NaN         NaN       NaN       NaN       NaN       NaN
d                   AU       10.962232           4            NaN         NaN       NaN       NaN       NaN       NaN
e                   NZ       16.528398           5            NaN         NaN       NaN       NaN       NaN       NaN
f                   NZ       29.908619           6            NaN         NaN       NaN       NaN       NaN       NaN
g                   US        5.439925           7            NaN         NaN       NaN       NaN       NaN       NaN

如您所见,pandas添加了一堆NaN值,因为我们只为第一行分配了内容,而DataFrame必须是矩形的


使用不同的DataFrame来汇总总数以及每个国家/地区内的内容要简单得多。如果可以的话,那么一切都简化为一个.pivot_table

df.pivot_table(index='End Consumer Country', 
               values=['Sum of Revenue', 'Units Sold'],
               margins=True,
               aggfunc='sum').T.add_suffix(' Total)

输出:

End Consumer Country   AR Total   AU Total   NZ Total  US Total   All Total
Sum of Revenue        39.470289  32.660103  46.437018  5.439925  124.007334
Units Sold             3.000000   7.000000  11.000000  7.000000   28.000000

相同的信息,更容易编写代码。


样本数据:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'End Consumer Country': ['AR', 'AR', 'AU', 'AU', 'NZ', 'NZ', 'US'],
                   'Sum of Revenue': np.random.normal(20,6,7),
                   'Units Sold': np.arange(1,8,1)},
                   index = list('abcdefg'))

  End Consumer Country  Sum of Revenue  Units Sold
a                   AR       13.486216           1
b                   AR       25.984073           2
c                   AU       21.697871           3
d                   AU       10.962232           4
e                   NZ       16.528398           5
f                   NZ       29.908619           6
g                   US        5.439925           7