在熊猫中汇总列值并将总数附加或合并到数据帧?

时间:2016-03-04 21:13:29

标签: python pandas

我有这个功能:

def source_revenue(self):
    items = self.data.items()
    df = pandas.DataFrame(
        {'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]})
    pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], values=['INCOME'])
    suming = pivoting.sum(index=(0), columns=(1))

这个函数产生了这个:

INCOME    216424.9
dtype: float64

如果没有求和,它将返回完整的数据帧,如下所示:

                               INCOME
SOURCE OF BUSINESS                    
BYD - Other                      500.0
BYD - Retail                    1584.0
BYD - Transport                42498.0
BYD Beverage - A La Carte      39401.5
BYD Food - A La Carte 瓦厂食品-零点  68365.0
BYD Food - Catering Banquet    53796.0
BYD Rooms 瓦厂房间                  5148.0
GS - Retail                      386.0
GS Food - A La Carte              48.0
Orchard Retail                   130.0
SCH - Food - A La Carte           96.0
SCH - Retail                     375.4
SCH - Transport                  888.0
SCH Beverage - A La Carte        119.0
Spa                             3052.0
XLM Beverage - A La Carte         38.0

我这样做的原因是因为我试图获取所有返回行的总数,将它们相加并将总数附加到数据帧。

最初我尝试使用margin = True(我在这里读到它是总和并将总数附加到数据帧,而不是真的)

所以我想知道是否有办法返回数据帧,但也总结了值并将总数附加到数据帧的末尾,就像margins = True那样。

2 个答案:

答案 0 :(得分:1)

我认为您可以使用groupby作为pivot_table,因为此处groupby更快。

您可以使用pivot_table,但默认aggfuncnp.mean。它很容易忘记:

pivoting = pd.pivot_table(df, 
                          index=['SOURCE OF BUSINESS'], 
                          values=['INCOME'], 
                          aggfunc=np.mean)

我认为你需要aggfunc=np.sum

print df
     A    B      C  D
0  zoo  one  small  1
1  zoo  one  large  2
2  zoo  one  large  2
3  foo  two  small  3
4  foo  two  small  3
5  bar  one  large  4
6  bar  one  small  5
7  bar  two  small  6
8  bar  two  large  7

print pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
A
bar    22
foo     6
zoo     5
Name: D, dtype: int64

df1 = df.groupby('A')['D'].sum()
print df1
A
bar    22
foo     6
zoo     5
Name: D, dtype: int64

如果您需要向系列添加Total,请使用locsum

print df1.sum()
33

df1.loc['Total'] = df1.sum()
print df1
A
bar      22
foo       6
zoo       5
Total    33
Name: D, dtype: int64

<强>计时

In [111]: %timeit df.groupby('A')['D'].sum()
1000 loops, best of 3: 581 µs per loop

In [112]: %timeit pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
100 loops, best of 3: 2.28 ms per loop

Totalsetting with enlargement中添加df

print df
                              INCOME
SOURCE OF BUSINESS                  
BYD - Other                    500.0
BYD - Retail                  1584.0
BYD - Transport              42498.0
BYD Beverage - A La Carte    39401.5
BYD Food - A La Carte        68365.0
BYD Food - Catering Banquet  53796.0
BYD Rooms                     5148.0
GS - Retail                    386.0
GS Food - A La Carte            48.0
Orchard Retail                 130.0
SCH - Food - A La Carte         96.0
SCH - Retail                   375.4
SCH - Transport                888.0
SCH Beverage - A La Carte      119.0
Spa                           3052.0
XLM Beverage - A La Carte       38.0
df.loc['Total', 'INCOME'] = df['INCOME'].sum()
print df
                               INCOME
SOURCE OF BUSINESS                   
BYD - Other                     500.0
BYD - Retail                   1584.0
BYD - Transport               42498.0
BYD Beverage - A La Carte     39401.5
BYD Food - A La Carte         68365.0
BYD Food - Catering Banquet   53796.0
BYD Rooms                      5148.0
GS - Retail                     386.0
GS Food - A La Carte             48.0
Orchard Retail                  130.0
SCH - Food - A La Carte          96.0
SCH - Retail                    375.4
SCH - Transport                 888.0
SCH Beverage - A La Carte       119.0
Spa                            3052.0
XLM Beverage - A La Carte        38.0
Total                        216424.9

答案 1 :(得分:1)

today.dateByAddingTimeInterval(customInterval)//customInterval is in seconds 会在数据框的末尾添加一行。然后,您的数据需要匹配正确的列数。此外,我不建议将此添加到您的数据中,因为任何后续分析都无效。可能最好创建一个新系列,然后根据需要进行连接以便显示。

df.ix[len(df)] = ...