Question

Need to optimize a single line of code that will be executed tens of thousands of times during the calculations and hence timing becomes an issue. Seems to be simple but really got stuck.

The line is:

df['Random']=df['column'].groupby(level=0).transform(lambda x: np.random.rand())

So I want to assign the same random number to each group and "ungroup". Since rand() is called many times using this implementation the code is very ineffective.

Can someone help in vectorizing this?

Answer 1

Try this!

df = pd.DataFrame(np.sort(np.random.randint(2,5,50)),columns=['column'])
uniques =df['column'].unique()
final = df.merge(pd.Series(np.random.rand(len(uniques)),index=uniques).to_frame(),
                 left_on='column',right_index=True)

You can store the uniques and then run last line every time to get new random numbers and join with df.

Apply np.random.rand to groups - optimization issue

1 个答案: