显示一列的总数而不重复值

时间:2019-01-23 11:02:26

标签: python pandas

我有一个脚本,该脚本输出具有五列的csv。我在这两列中加了两行代码。我设法做到了,但是,总计是每行重复这些列,我只希望总计显示在一行上。

df['Unit Total'] = df['Units Sold'].sum()
df['Total Revenue'] = df['data_revenue'].sum()

这是我的脚本产生的

8   0.013207    AR  ARS 0.105656    74012   575.2779
10  0.013207    AR  ARS 0.13207     74012   575.2779
6   0.013207    AR  ARS 0.079242    74012   575.2779
6   0.013207    AR  ARS 0.079242    74012   575.2779

我真正想看到的东西

8   0.013207    AR  ARS 0.105656    74012   575.2779
10  0.013207    AR  ARS 0.13207     
6   0.013207    AR  ARS 0.079242    
6   0.013207    AR  ARS 0.079242    

我的脚本

for filename in filelist:
    print(filename)
    df = pandas.read_csv('SYB_M_20171001_20171031.txt', header=None, encoding='utf-8', sep='\t', names=colnames,
                         skiprows=3, usecols=['Units Sold', 'Dealer Price', 'End Consumer Country', 'Currency Code']
                         )
    df['data_revenue'] = df['Units Sold'] * df['Dealer Price']
    df = df.sort_values(['End Consumer Country', 'Currency Code'])
    df['Unit Total'] = df['Units Sold'].sum()
    df['Total Revenue'] = df['data_revenue'].sum()
    df.to_csv(outfile + r"\output.csv", index=None)
    dflist.append(filename)

3 个答案:

答案 0 :(得分:1)

按位置设置索引的第一个值:

df.loc[df.index[0], 'Unit Total'] = df['Units Sold'].sum()

df.loc[df.index[0], 'Unit Revenue'] = df['data_revenue'].sum()

另一种解决方案是由reset_indexdrop=True创建默认索引,因此可以由0设置:

df = df.sort_values(['End Consumer Country', 'Currency Code']).reset_index(drop=True)

df.loc[0, 'Unit Total'] = df['Units Sold'].sum()
df.loc[0, 'Unit Revenue'] = df['data_revenue'].sum()

答案 1 :(得分:0)

尝试一下

df.loc[0,'Unit Total']=df['Units Sold'].sum()
df.loc[0,'Total Revenue']=df['data_revenue'].sum()

答案 2 :(得分:0)

您可以通过一个iloc呼叫来分配:

label_positions = list(map(df.columns.get_loc, ['Unit Total', 'data_revenue']))
df.iloc[0, label_positions] = df[['Units Sold', 'data_revenue']].sum().values