我已经复制了这些数据集,现在我想以标准化的形式制作这些数据集,我是一个关于数据科学的初学者,所以我如何通过使用python代码做进一步的工作
IS_MOBILE,n_products_viewed,visit_duration,is_returning_visitor,TIME_OF_DAY,user_action 1,0,0.657509946,0,3,0 1,1,0.568571234,0,2,1 1,0,0.042245997,1,1,0 1,1,1.659793381,1,1,2 0,1,2.014744849,1,1,2 1,1,0.512447387,1,1,2 0,0,1.440327098,1,1,0 1,0,0.035260233,0,3,0 0,1,1.490764094,0,0,1 0,0,0.005837521,1,3,0 0,4,2.04604049,1,0,3 0,0,0.955889466,0,3,0
答案 0 :(得分:0)
我假设您正在整理您的数据。以下是关于整洁数据定义的一般让步。
Each variable you measure should be in one column.
Each different observation of that variable should be in a different row.
There should be one table for each "kind" of variable.
If you have multiple tables, they should include a column in the table that allows them to be linked.
https://en.wikipedia.org/wiki/Tidy_data
我没有看到将逗号作为分隔符的任何问题。 pandas可以使用pandas.read_csv()加载csv。
如果您想要清理和重新排列数据,可以使用pivot_table并从pandas库中解压缩方法。