Apply np.random.rand to groups - optimization issue

时间:2018-12-27 12:55:13

标签: pandas pandas-groupby pandas-apply

Need to optimize a single line of code that will be executed tens of thousands of times during the calculations and hence timing becomes an issue. Seems to be simple but really got stuck.

The line is:

df['Random']=df['column'].groupby(level=0).transform(lambda x: np.random.rand())

So I want to assign the same random number to each group and "ungroup". Since rand() is called many times using this implementation the code is very ineffective.

Can someone help in vectorizing this?

1 个答案:

答案 0 :(得分:2)

Try this!

df = pd.DataFrame(np.sort(np.random.randint(2,5,50)),columns=['column'])
uniques =df['column'].unique()
final = df.merge(pd.Series(np.random.rand(len(uniques)),index=uniques).to_frame(),
                 left_on='column',right_index=True)

You can store the uniques and then run last line every time to get new random numbers and join with df.