如何将RAW数据集转换为标准化数据集

时间:2017-10-14 19:55:44

标签: python-2.7 data-science

enter image description here how to convert these datasets into valid datasets ,to do further case study

我已经复制了这些数据集,现在我想以标准化的形式制作这些数据集,我是一个关于数据科学的初学者,所以我如何通过使用python代码做进一步的工作

IS_MOBILE,n_products_viewed,visit_duration,is_returning_visitor,TIME_OF_DAY,user_action      1,0,0.657509946,0,3,0      1,1,0.568571234,0,2,1      1,0,0.042245997,1,1,0      1,1,1.659793381,1,1,2      0,1,2.014744849,1,1,2      1,1,0.512447387,1,1,2      0,0,1.440327098,1,1,0      1,0,0.035260233,0,3,0      0,1,1.490764094,0,0,1       0,0,0.005837521,1,3,0       0,4,2.04604049,1,0,3      0,0,0.955889466,0,3,0

1 个答案:

答案 0 :(得分:0)

我假设您正在整理您的数据。以下是关于整洁数据定义的一般让步。

Each variable you measure should be in one column.
Each different observation of that variable should be in a different row.
There should be one table for each "kind" of variable.
If you have multiple tables, they should include a column in the table that allows them to be linked.

https://en.wikipedia.org/wiki/Tidy_data

我没有看到将逗号作为分隔符的任何问题。 pandas可以使用pandas.read_csv()加载csv。

如果您想要清理和重新排列数据,可以使用pivot_table并从pandas库中解压缩方法。