Pandas - 格式化csv文件,为列添加名称

时间:2017-08-01 23:35:12

标签: python pandas csv

我已从机器学习存储库下载了数据集(.data),并将其另存为cvs文件。然后我用pandas阅读它:

dataset = pd.read_csv('mileage.csv')

打印如下:

enter image description here

但现在我需要在数据中添加(命名)columns,我尝试用它来做:

dataset = pd.read_csv('mileage.csv', names=["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"])
然而,

打印:

enter image description here

并且所有数据都挤进了一列......

我应该添加逗号'首先到cvs数据?

如何正确预处理这些数据,每列的每个数据?

1 个答案:

答案 0 :(得分:0)

您可以使用assign初始化新列。看来有些列已经存在于原始数据中,因此我将使用条件字典理解来获取新数据。

new_cols = ["mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin", "car name"]

dataset = pd.read_csv('mileage.csv')
dataset = dataset.assign(**{c: None for c in new_cols if c not in dataset})

直接访问一些样本数据:

import urllib2

url = 'https://raw.githubusercontent.com/chrisjameskirkham/car-mpg/master/auto-mpg-nameless.csv'
response = urllib2.urlopen(url)
dataset = pd.read_csv(response).assign(**{c: None for c in new_cols if c not in dataset})