熊猫= df.groupby(['daterange'])['QTY']。sum()弄乱了两列

时间:2019-06-13 04:56:32

标签: python pandas dataframe pandas-groupby

希望这很简单。

当试图对笔记本末尾进行分组和求和时,DF中的两列都以意外数据结尾。

编辑以添加更多上下文:

import pandas as pd
df=pd.read_csv('historicaldata.csv', parse_dates = False)
df.dtypes
Unnamed: 0          int64
STYLE_AUDIT_ID      int64
MACHINE_ID          int64
STYLE_ID            int64
SKU_ID              int64
SKU_BUCKET_ID       int64
DTE                object
STORE_ID            int64
ATTR2              object
ATTR1             float64
SIZ                object
QTY               float64
COST              float64
RETAIL            float64
TRAN_TYPE          object
TRAN_NUM            int64
DISPLAY_NUM         int64
SUB_STYLE         float64
EMPLOYEE_ID       float64
SERIAL_NUM        float64
dtype: object
df = df.loc[:, df.columns.isin(['DTE','QTY'])]
df.head()
                 DTE    QTY
0   2014-03-13 17:22:15 24.0
1   2014-04-10 14:52:32 24.0
2   2014-05-01 11:34:56 24.0
3   2014-05-21 12:27:04 24.0
4   2014-05-29 11:54:24 24.0

# Adding the "new_row" to allow for .shift() function later that will hopefully sum everything up
new_row = pd.DataFrame({'DTE':'2014-03-12 00:00:00', 'QTY':0.0}, index =[0]) 
df = pd.concat([new_row, df], ignore_index=True)
df.head
      DTE    QTY
0     2014-03-12 00:00:00    0.0
1     2014-03-13 17:22:15   24.0
2     2014-04-10 14:52:32   24.0
3     2014-05-01 11:34:56   24.0
4     2014-05-21 12:27:04   24.0
df['DTE'] = pd.to_datetime(df['DTE'])
df['DTE'] = df['DTE'].dt.date
df['DTE'] = pd.to_datetime(df['DTE'])

df.dtypes
DTE    datetime64[ns]
QTY           float64
dtype: object

df.head()

            DTE    QTY
0    2014-03-12    0.0
1    2014-03-13   24.0
2    2014-04-10   24.0
3    2014-05-01   24.0
4    2014-05-21   24.0
5    2014-05-29   24.0
6    2014-06-06   48.0

...最后,我认为问题出在哪里:

df = df.groupby(['DTE'], as_index=False)['QTY'].sum()

现在:df.head()

            DTE   QTY
0    2014-01-30  16.0
1    2014-01-31  -1.0
2    2014-02-16  -1.0
3    2014-02-23  -2.0
4    2014-02-27  -2.0
5    2014-03-02  -3.0
6    2014-03-07  -2.0
7    2014-03-08  -1.0
8    2014-03-10  -4.0
9    2014-03-12   0.0
10   2014-03-13  24.0
11   2014-03-14  -2.0

值得注意的是,运行groupby函数之前的非null值会一直传递到新的df中,但是其他QTY数字似乎是任意的....那么我会缺少什么呢?

如果有意义的话,这只是较大笔记本的一小部分,但为了清晰起见,我尝试使其简洁。

特别是我正在寻找的输出:

启动DF(示例数据):

            DTE    QTY
0    2014-03-12    0.0
1    2014-03-13   24.0
2    2014-03-13   24.0
3    2014-05-01   24.0

所需的输出:

            DTE    QTY
0    2014-03-12    0.0
1    2014-03-13   48.0
2    2014-05-01   24.0

0 个答案:

没有答案