希望这很简单。
当试图对笔记本末尾进行分组和求和时,DF中的两列都以意外数据结尾。
编辑以添加更多上下文:
import pandas as pd
df=pd.read_csv('historicaldata.csv', parse_dates = False)
df.dtypes
Unnamed: 0 int64
STYLE_AUDIT_ID int64
MACHINE_ID int64
STYLE_ID int64
SKU_ID int64
SKU_BUCKET_ID int64
DTE object
STORE_ID int64
ATTR2 object
ATTR1 float64
SIZ object
QTY float64
COST float64
RETAIL float64
TRAN_TYPE object
TRAN_NUM int64
DISPLAY_NUM int64
SUB_STYLE float64
EMPLOYEE_ID float64
SERIAL_NUM float64
dtype: object
df = df.loc[:, df.columns.isin(['DTE','QTY'])]
df.head()
DTE QTY
0 2014-03-13 17:22:15 24.0
1 2014-04-10 14:52:32 24.0
2 2014-05-01 11:34:56 24.0
3 2014-05-21 12:27:04 24.0
4 2014-05-29 11:54:24 24.0
# Adding the "new_row" to allow for .shift() function later that will hopefully sum everything up
new_row = pd.DataFrame({'DTE':'2014-03-12 00:00:00', 'QTY':0.0}, index =[0])
df = pd.concat([new_row, df], ignore_index=True)
df.head
DTE QTY
0 2014-03-12 00:00:00 0.0
1 2014-03-13 17:22:15 24.0
2 2014-04-10 14:52:32 24.0
3 2014-05-01 11:34:56 24.0
4 2014-05-21 12:27:04 24.0
df['DTE'] = pd.to_datetime(df['DTE'])
df['DTE'] = df['DTE'].dt.date
df['DTE'] = pd.to_datetime(df['DTE'])
df.dtypes
DTE datetime64[ns]
QTY float64
dtype: object
df.head()
DTE QTY
0 2014-03-12 0.0
1 2014-03-13 24.0
2 2014-04-10 24.0
3 2014-05-01 24.0
4 2014-05-21 24.0
5 2014-05-29 24.0
6 2014-06-06 48.0
...最后,我认为问题出在哪里:
df = df.groupby(['DTE'], as_index=False)['QTY'].sum()
现在:df.head()
DTE QTY
0 2014-01-30 16.0
1 2014-01-31 -1.0
2 2014-02-16 -1.0
3 2014-02-23 -2.0
4 2014-02-27 -2.0
5 2014-03-02 -3.0
6 2014-03-07 -2.0
7 2014-03-08 -1.0
8 2014-03-10 -4.0
9 2014-03-12 0.0
10 2014-03-13 24.0
11 2014-03-14 -2.0
值得注意的是,运行groupby函数之前的非null值会一直传递到新的df中,但是其他QTY数字似乎是任意的....那么我会缺少什么呢?
如果有意义的话,这只是较大笔记本的一小部分,但为了清晰起见,我尝试使其简洁。
特别是我正在寻找的输出:
启动DF(示例数据):
DTE QTY
0 2014-03-12 0.0
1 2014-03-13 24.0
2 2014-03-13 24.0
3 2014-05-01 24.0
所需的输出:
DTE QTY
0 2014-03-12 0.0
1 2014-03-13 48.0
2 2014-05-01 24.0