我试图转换数据框以进行市场篮子分析
Sales Order Number, Product Category, Product, SKU Quantity
A123, Book, book of python, 1
A123, Book, book of java, 2
A123, Book, how to sleep well, 1
A300, Book, English speaking, 1
...............................
当我想通过下面的代码将数据帧转换为以下格式时,由于10208个以上的不同“产品”,它会引起错误。
Sales Order Number, book of python, book of java,how to sleep well,English speaking .....
A123, 1, 2, 1, 0, 0, 0,...0
A300, 0, 0, 0, 1, 0, 0,...0
(超过1万列->内存问题)
basket = (df[df['Product Category'] == "Book"].groupby(['Sales Order Number', 'Product'])['SKU Quantity'].sum().unstack().reset_index().fillna(0).set_index('Sales Order Number'))
请问有什么好主意可以解决该问题,以便程序可以转换大数据?
谢谢