规范化pandas中的列

时间:2018-05-22 17:14:53

标签: python python-2.7 pandas

我希望在pandas数据框中为每个page规范化page days since publishing pageviews example.com/a 1 5000 example.com/a 2 10000 example.com/a 3 7500 example.com/b 1 10000 example.com/b 2 20000 example.com/b 3 15000 的最大/最小值,如下所示:

page                days since publishing   pageviews   
example.com/a       1                       0
example.com/a       2                       1
example.com/a       3                       0.5
example.com/b       1                       0
example.com/b       2                       1
example.com/b       3                       0.5

我想制作类似的东西:

mysql.server stop/restart/start

数据集大约有100 000行。任何有效完成此任务的帮助都将非常感激。

1 个答案:

答案 0 :(得分:-1)

a = pd.DataFrame(pd.read_csv('input.csv'))
b = a.groupby('page').min()
b.reset_index(inplace=True)
a = pd.merge(a,b,how='left',right_on = 'page',left_on = 'page')
a['minmaxscale'] = (a.pageviews_x-a.pageviews_y)/a.pageviews_y

产生

            page  days since publishing_x  pageviews_x  \
0  example.com/a                        1         5000   
1  example.com/a                        2        10000   
2  example.com/a                        3         7500   
3  example.com/b                        1        10000   
4  example.com/b                        2        20000   
5  example.com/b                        3        15000   

   days since publishing_y  pageviews_y  minmaxscale  
0                        1         5000          0.0  
1                        1         5000          1.0  
2                        1         5000          0.5  
3                        1        10000          0.0  
4                        1        10000          1.0  
5                        1        10000          0.5