我有以下代码:
import pandas as pd
df=pd.read_csv("https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1", encoding="latin-1")
pvt_received=df.pivot_table(index=['site'], values = ['received','sent'], aggfunc = { 'received' : 'count' ,'sent': 'count'}, fill_value=0, margins=True)
pvt_received['to_send']=pvt_received['received']-pvt_received['sent']
column_order = ['received', 'sent','to_send']
pvt_received_ordered = pvt_received.reindex_axis(column_order, axis=1)
pvt_received_ordered.to_csv("test_pivot.csv")
table_to_send = pd.read_csv('test_pivot.csv', encoding='latin-1')
table_to_send.rename(columns={'site':'Site','received':'Date Received','sent':'Date Sent','to_send':'Date To Send'}, inplace=True)
table_to_send.set_index('Site', inplace=True)
table_to_send
哪个生成此表:
Date Received Date Sent Date To Send
Site
2 32.0 27.0 5.0
3 20.0 17.0 3.0
4 33.0 31.0 2.0
5 40.0 31.0 9.0
All 106.0 106.0 0.0
但是此参数margins = True没有给出正确的每列总计结果。例如,“接收的日期”应该是125,而不是106,“发送的日期”应该是106(正确),“发送的日期”应该是19,而不是0.0(零)。问题:我应该更改以获得正确的数字?另外,所有行上都缺少应做的总和。提前谢谢。
答案 0 :(得分:2)
从您的代码看来,您是在构建数据透视表之后创建Date To Send
的,因此它只是为您提供以下结果:106.0 - 106.0
。同样,它们在分组后将边距值设置为calculated,且默认值为dropna=True
,这意味着将删除具有NaN
或NaT
的行。设置dropna=False
应该可以解决此问题。
我重构了您的代码,以在创建数据透视表和received
列之前将sent
和date_time
列转换为to_send
格式。
df2 = pd.read_csv(
"https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1"
,encoding="latin-1")
df2['received'] = pd.to_datetime(df2['received'])
df2['sent'] = pd.to_datetime(df2['sent'])
然后创建最初打算的数据透视表。
pvt_received = df2.pivot_table(index=['site'], values=['received','sent'],\
aggfunc='count', margins=True, dropna=False)
pvt_received['to_send'] = pvt_received['received'] - pvt_received['sent']
pvt_received.rename(columns={'site':'Site'
,'received':'Date Received'
,'sent':'Date Sent'
,'to_send':'Date To Send'}
,inplace=True)
pvt_received
Date Received Date Sent Date To Send
Site
2 32 27 5
3 20 17 3
4 33 31 2
5 40 31 9
All 125 106 25