在python表中添加特定日期

时间:2018-05-31 05:30:12

标签: python python-3.x pandas numpy pivot-table

我有一个数据集(Product_ID,date_time,Sold),其中包含在不同日期销售的产品。日期不一致,从一个月起随机13天或更长时间给出9个月。我必须以这样的方式分离数据:每个产品在1-3天,4-7天,8-15天和> 16天内销售了多少产品。 。那么如何使用pandas和其他包

在python中对此进行编码

PRODUCT_ID DATE_LOCATION Sold 0E4234 01-08-16 0:00 2 0E4234 02-08-16 0:00 7 0E4234 04-08-16 0:00 3 0E4234 08-08-16 0:00 1 0E4234 09-08-16 0:00 2 . . (same product for 9 months sold data) . 0G2342 02-08-16 0:00 1 0G2342 03-08-16 0:00 2 0G2342 06-08-16 0:00 1 0G2342 09-08-16 0:00 1 0G2342 11-08-16 0:00 3 0G2342 15-08-16 0:00 3 . . .(goes for 64 products each with 9 months of data) .

我甚至不知道如何在python中编写代码 所需的输出是

PRODUCT_ID      Days   Sold
0E4234          1-3      9
                4-7      3
                8-15     16
                 >16     (remaing values sum)
0G2342          1-3      3
                4-7      1
                8-15     7
                 >16    (remaing values sum)
.
.(for 64 products)
.

如果至少有人发布了从哪里开始的链接

,那会很高兴

2 个答案:

答案 0 :(得分:2)

您可以先将日期转换为dtetimes,然后按dt.day获取日期:

df['DATE_LOCATION'] = pd.to_datetime(df['DATE_LOCATION'], dayfirst=True)
days = df['DATE_LOCATION'].dt.day

然后按cut分组:

rng = pd.cut(days, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
print (rng)
0      1-3
1      1-3
2      4-7
3     8-15
4     8-15
5      1-3
6      1-3
7      4-7
8     8-15
9     8-15
10    8-15
Name: DATE_LOCATION, dtype: category
Categories (4, object): [1-3 < 4-7 < 8-15 < >=16]

按产品汇总sum并将Series分页:

df = df.groupby(["PRODUCT_ID",rng])['Sold'].sum()
print (df)
PRODUCT_ID  DATE_LOCATION
0E4234      1-3              9
            4-7              3
            8-15             3
0G2342      1-3              3
            4-7              1
            8-15             7
Name: Sold, dtype: int64

如果还需要按year s计算:

df = df.groupby([df['DATE_LOCATION'].dt.year.rename('YEAR'), "PRODUCT_ID",rng])['Sold'].sum()
print (df)

YEAR  PRODUCT_ID  DATE_LOCATION
2016  0E4234      1-3              9
                  4-7              3
                  8-15             3
      0G2342      1-3              3
                  4-7              1
                  8-15             7
Name: Sold, dtype: int64

答案 1 :(得分:0)

假设您的数据框名为df。

df["DATE_LOCATION"] = pd.to_datetime(df.DATE_LOCATION)
df["DAY"] = df.DATE_LOCATION.dt.day

def flag(x):
    if 1<=x<=3:
        return '1-3'
    elif 4<=x<=7:
        return '4-7'
    elif 8<=x<=15:
        return '8-15'
    else:
        return '>16' # maybe you mean '>=16'.

df["Days"] = df.DAY.apply(flag)

df.groupby(["PRODUCT_ID","Days"]).Sold.sum()