我有df,我需要在其中创建新列。
i,ID,url,used_at,active_seconds,domain,search_term, diff_time
322015,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-10-31 09:16:05,35,vk.com,None, 108
838267,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed,2015-10-31 09:16:38,54,vk.com,None, 79
838271,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-11-30 09:17:32,34,vk.com,None, 513
322026,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos&z=photo143297356_397216312%2Ffeed1_143297356_1451504298,2015-11- 30 09:18:06,4,vk.com,None, 242
838275,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:18:10,4,vk.com,None, 131
322028,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:14,8,vk.com,None, 317
322029,f85ce4b2f8787d48edc8612b2ccaca83,megarand.ru/contest/121070,2015-12-31 09:18:22,16,megarand.ru,None, 17
1870917,f85ce4b2f8787d48edc8612b2ccaca83,eldorado.ru/cat/1461428,2015-12-31 09:18:38,6,vk.com,None, 129
1354612,f85ce4b2f8787d48edc8612b2ccaca83,vk.com/antoninaribina,2015-12-31 19:18:44,56,vk.com,None, 417
我想在下一个字符串中添加列period
和diff_time < 500
,period = i
,diff_time > 500
,period = i + 1
和id
{来自prev字符串的{1}} !=
id
欲望输出
period = i + 1
答案 0 :(得分:2)
构造一个switch变量,如果需要增加句点则存储true,否则返回false,然后在获得的序列上调用cumsum()
函数:
switch = (df.diff_time > 500) | (df.ID != df.ID.shift().fillna(df.ID[0]))
switch.cumsum() + 1
# 0 1
# 1 1
# 2 2
# 3 2
# 4 2
# 5 2
# 6 3
# 7 3
# 8 4
# dtype: int64
将其分配回数据框应该可以满足您的需求:
df['period'] = switch.cumsum() + 1