假设我的数据框看起来像这样:
date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0
最后的count
列是累积计数。
我需要做的是找到特定的实际计数
日期+网站+国家/种类+ ID元组,这将导致:
date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0
我知道这会涉及groupby
电话,但我不知道该怎么办。我们假设元组的第一个实例的计数为0
。
任何帮助都会很棒。感谢
答案 0 :(得分:2)
使用diff
+ cumsum
,cols = ['site', 'country_code', 'kind', 'ID']
df['count'] = df.groupby(cols)['count'].diff().fillna(0)
print(df['count'])
0 0.0
1 0.0
2 0.0
3 1.0
4 0.0
5 0.0
6 0.0
7 0.0
8 3.0
9 0.0
10 3.0
11 2.0
Name: count, dtype: float64
的倒数。
s = 'azcbobobegghakl'
substrings = [s[i:] for i in range(0, len(s))]
filtered_s = filter(substrings, lambda s: s.startswith("bob"))
result = len(filtered_s)
感谢MaxU指出错误!