我是Pandas的新手,我正在尝试根据应用于现有列的条件来学习列创建。我正在处理蜂窝数据,这就是我的源数据的样子(右边的2列是空的):
DEVICE_ID | MONTH | TYPE | DAY | COUNT | LAST_MONTH| SEASONAL_AVG
8129 | 201601 | VOICE | 1 | 8 | |
8129 | 201502 | VOICE | 1 | 5 | |
8129 | 201501 | VOICE | 1 | 2 | |
8321 | 201403 | DATA | 3 | 1 | |
2908 | 201302 | TEXT | 5 | 4 | |
8129 | 201406 | VOICE | 2 | 3 | |
8129 | 201306 | VOICE | 2 | 7 | |
3096 | 201501 | DATA | 5 | 6 | |
8129 | 201301 | VOICE | 1 | 2 | |
我使用这些数据创建了一个数据框,并将其命名为df。
df = pd.DataFrame({'DEVICE_ID' : [8129, 8129,8129,8321,2908,8129,8129,3096,8129],
'MONTH' : [201601,201502,201501,201403,201302,201406,201306,201501,201301],
'TYPE' : ['VOICE','VOICE','VOICE','DATA','TEXT','VOICE','VOICE','DATA','VOICE'],
'DAY' : [1,1,1,3,5,2,2,5,1],
'COUNT' : [8,5,2,1,4,3,7,6,2]
})
我正在尝试为df创建两个额外的列:'LAST_MONTH'和'SEASONAL_AVG'。这两列的逻辑:
LAST_MONTH:对应的DEVICE_ID& TYPE& DAY组合返回上个月的COUNT。例如:对于第1行(DEVICE_ID:8129,TYPE:VOICE,DAY:1,MONTH 201502),LAST_MONTH将是第2行的COUNT(DEVICE_ID:8129,TYPE:VOICE,DAY:1,MONTH 201501.如果没有记录对于上个月,LAST_MONTH将为零。
SEASONAL_AVG:对应的DEVICE_ID& TYPE& DAY组合返回前几年相应月份的平均值(数据从201301开始)。例如:行0的SEASONAL_AVG =第2行和第8行的COUNT的平均值。从过去开始,相应月份将始终至少有一条记录。不需要适用于所有TYPE和DAY组合,但至少有一些可能的组合将出现在所有DEVICE_ID中。
非常感谢您的帮助!谢谢!
EDIT1:
def last_month(record):
year = int(str(record['MONTH'])[:4])
month = int(str(record['MONTH'])[-2:])
if month in (2,3,4,5,6,7,8,9,10):
x = str(0)+str(month-1)
y = int(str(year)+str(x))
last_month = int(y)
elif month == 1:
last_month = int(str(year-1)+str(12))
else:
last_month = int(str(year)+str(month-1))
day = record['DAY']
cellular_type = record['TYPE']
#return record['COUNT']
return record['COUNT'][(record['MONTH'] == last_month) & (record['DAY'] == day) & (record['TYPE'] == cellular_type)]
df['last_month'] = df.apply (lambda record: last_month(record),axis=1)