我希望在pandas数据框中为每个page
规范化page days since publishing pageviews
example.com/a 1 5000
example.com/a 2 10000
example.com/a 3 7500
example.com/b 1 10000
example.com/b 2 20000
example.com/b 3 15000
的最大/最小值,如下所示:
page days since publishing pageviews
example.com/a 1 0
example.com/a 2 1
example.com/a 3 0.5
example.com/b 1 0
example.com/b 2 1
example.com/b 3 0.5
我想制作类似的东西:
mysql.server stop/restart/start
数据集大约有100 000行。任何有效完成此任务的帮助都将非常感激。
答案 0 :(得分:-1)
a = pd.DataFrame(pd.read_csv('input.csv'))
b = a.groupby('page').min()
b.reset_index(inplace=True)
a = pd.merge(a,b,how='left',right_on = 'page',left_on = 'page')
a['minmaxscale'] = (a.pageviews_x-a.pageviews_y)/a.pageviews_y
产生
page days since publishing_x pageviews_x \
0 example.com/a 1 5000
1 example.com/a 2 10000
2 example.com/a 3 7500
3 example.com/b 1 10000
4 example.com/b 2 20000
5 example.com/b 3 15000
days since publishing_y pageviews_y minmaxscale
0 1 5000 0.0
1 1 5000 1.0
2 1 5000 0.5
3 1 10000 0.0
4 1 10000 1.0
5 1 10000 0.5