这与我昨天问的问题非常相似。目的是能够添加一种功能,该功能将允许根据另一个中显示的值来创建列。例如,当它在指定的文件中找到国家/地区代码时,我希望它创建一个名称为“ 国家/地区代码总计”的列,并对具有相同列的每一行的单位数量求和国家代码
这是我的脚本当前输出:
我想看的东西
我的脚本:
df['Sum of Revenue'] = df['Units Sold'] * df['Dealer Price']
df['AR Revenue'] = df[]
df = df.sort_values(['End Consumer Country', 'Currency Code'])
# Sets first value of index by position
df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()
# Sets first value of index by position
df.loc[df.index[0], 'Total Revenue'] = df['Sum of Revenue'].sum()
# Sums the amout of Units with the End Consumer Country AR
df['AR Total'] = df.loc[df['End Consumer Country'] == 'AR', 'Units Sold'].sum()
# Sums the amount of Units with the End Consumer Country AU
df['AU Total'] = df.loc[df['End Consumer Country'] == 'AU', 'Units Sold'].sum()
# Sums the amount of Units with the End Consumer Country NZ
df['NZ Total'] = df.loc[df['End Consumer Country'] == 'NZ', 'Units Sold'].sum()
但是,据我所知该文件中将出现的国家/地区,我已将它们相应地添加到要查找的脚本中。我将如何编写脚本,以便如果找到另一个国家代码(例如GB),它将创建一个名为“ GB Total”的列,并对国家代码设置为GB的每一行的单位求和。
任何帮助将不胜感激!
答案 0 :(得分:1)
如果您确实需要这种格式,那么我将按照以下步骤进行操作(从下面开始数据):
# Get those first two columns
d = {'Sum of Revenue': 'Total Revenue', 'Units Sold': 'Total Sold'}
for col, newcol in d.items():
df.loc[df.index[0], newcol] = df[col].sum()
# Add the rest for every country:
s = df.groupby('End Consumer Country')['Units Sold'].sum().to_frame().T.add_suffix(' Total')
s.index = [df.index[0]]
df = pd.concat([df, s], 1, sort=False)
df
: End Consumer Country Sum of Revenue Units Sold Total Revenue Total Sold AR Total AU Total NZ Total US Total
a AR 13.486216 1 124.007334 28.0 3.0 7.0 11.0 7.0
b AR 25.984073 2 NaN NaN NaN NaN NaN NaN
c AU 21.697871 3 NaN NaN NaN NaN NaN NaN
d AU 10.962232 4 NaN NaN NaN NaN NaN NaN
e NZ 16.528398 5 NaN NaN NaN NaN NaN NaN
f NZ 29.908619 6 NaN NaN NaN NaN NaN NaN
g US 5.439925 7 NaN NaN NaN NaN NaN NaN
如您所见,pandas
添加了一堆NaN
值,因为我们只为第一行分配了内容,而DataFrame
必须是矩形的
使用不同的DataFrame
来汇总总数以及每个国家/地区内的内容要简单得多。如果可以的话,那么一切都简化为一个.pivot_table
df.pivot_table(index='End Consumer Country',
values=['Sum of Revenue', 'Units Sold'],
margins=True,
aggfunc='sum').T.add_suffix(' Total)
End Consumer Country AR Total AU Total NZ Total US Total All Total
Sum of Revenue 39.470289 32.660103 46.437018 5.439925 124.007334
Units Sold 3.000000 7.000000 11.000000 7.000000 28.000000
相同的信息,更容易编写代码。
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'End Consumer Country': ['AR', 'AR', 'AU', 'AU', 'NZ', 'NZ', 'US'],
'Sum of Revenue': np.random.normal(20,6,7),
'Units Sold': np.arange(1,8,1)},
index = list('abcdefg'))
End Consumer Country Sum of Revenue Units Sold
a AR 13.486216 1
b AR 25.984073 2
c AU 21.697871 3
d AU 10.962232 4
e NZ 16.528398 5
f NZ 29.908619 6
g US 5.439925 7