我有这个功能:
def source_revenue(self):
items = self.data.items()
df = pandas.DataFrame(
{'SOURCE OF BUSINESS': [i[0] for i in items], 'INCOME': [i[1] for i in items]})
pivoting = pd.pivot_table(df, index=['SOURCE OF BUSINESS'], values=['INCOME'])
suming = pivoting.sum(index=(0), columns=(1))
这个函数产生了这个:
INCOME 216424.9
dtype: float64
如果没有求和,它将返回完整的数据帧,如下所示:
INCOME
SOURCE OF BUSINESS
BYD - Other 500.0
BYD - Retail 1584.0
BYD - Transport 42498.0
BYD Beverage - A La Carte 39401.5
BYD Food - A La Carte 瓦厂食品-零点 68365.0
BYD Food - Catering Banquet 53796.0
BYD Rooms 瓦厂房间 5148.0
GS - Retail 386.0
GS Food - A La Carte 48.0
Orchard Retail 130.0
SCH - Food - A La Carte 96.0
SCH - Retail 375.4
SCH - Transport 888.0
SCH Beverage - A La Carte 119.0
Spa 3052.0
XLM Beverage - A La Carte 38.0
我这样做的原因是因为我试图获取所有返回行的总数,将它们相加并将总数附加到数据帧。
最初我尝试使用margin = True(我在这里读到它是总和并将总数附加到数据帧,而不是真的)
所以我想知道是否有办法返回数据帧,但也总结了值并将总数附加到数据帧的末尾,就像margins = True
那样。
答案 0 :(得分:1)
我认为您可以使用groupby
作为pivot_table
,因为此处groupby
更快。
您可以使用pivot_table
,但默认aggfunc
为np.mean
。它很容易忘记:
pivoting = pd.pivot_table(df,
index=['SOURCE OF BUSINESS'],
values=['INCOME'],
aggfunc=np.mean)
我认为你需要aggfunc=np.sum
:
print df
A B C D
0 zoo one small 1
1 zoo one large 2
2 zoo one large 2
3 foo two small 3
4 foo two small 3
5 bar one large 4
6 bar one small 5
7 bar two small 6
8 bar two large 7
print pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
A
bar 22
foo 6
zoo 5
Name: D, dtype: int64
df1 = df.groupby('A')['D'].sum()
print df1
A
bar 22
foo 6
zoo 5
Name: D, dtype: int64
print df1.sum()
33
df1.loc['Total'] = df1.sum()
print df1
A
bar 22
foo 6
zoo 5
Total 33
Name: D, dtype: int64
<强>计时强>:
In [111]: %timeit df.groupby('A')['D'].sum()
1000 loops, best of 3: 581 µs per loop
In [112]: %timeit pd.pivot_table(df, values='D', index=['A'], aggfunc=np.sum)
100 loops, best of 3: 2.28 ms per loop
Total
在setting with enlargement中添加df
:
print df
INCOME
SOURCE OF BUSINESS
BYD - Other 500.0
BYD - Retail 1584.0
BYD - Transport 42498.0
BYD Beverage - A La Carte 39401.5
BYD Food - A La Carte 68365.0
BYD Food - Catering Banquet 53796.0
BYD Rooms 5148.0
GS - Retail 386.0
GS Food - A La Carte 48.0
Orchard Retail 130.0
SCH - Food - A La Carte 96.0
SCH - Retail 375.4
SCH - Transport 888.0
SCH Beverage - A La Carte 119.0
Spa 3052.0
XLM Beverage - A La Carte 38.0
df.loc['Total', 'INCOME'] = df['INCOME'].sum()
print df
INCOME
SOURCE OF BUSINESS
BYD - Other 500.0
BYD - Retail 1584.0
BYD - Transport 42498.0
BYD Beverage - A La Carte 39401.5
BYD Food - A La Carte 68365.0
BYD Food - Catering Banquet 53796.0
BYD Rooms 5148.0
GS - Retail 386.0
GS Food - A La Carte 48.0
Orchard Retail 130.0
SCH - Food - A La Carte 96.0
SCH - Retail 375.4
SCH - Transport 888.0
SCH Beverage - A La Carte 119.0
Spa 3052.0
XLM Beverage - A La Carte 38.0
Total 216424.9
答案 1 :(得分:1)
today.dateByAddingTimeInterval(customInterval)//customInterval is in seconds
会在数据框的末尾添加一行。然后,您的数据需要匹配正确的列数。此外,我不建议将此添加到您的数据中,因为任何后续分析都无效。可能最好创建一个新系列,然后根据需要进行连接以便显示。
df.ix[len(df)] = ...