如何打开在线csv文件?

时间:2017-09-29 09:21:00

标签: python csv

这是csv文件的link。问题是我不知道如何打开它并对这组数据做些什么。

from urllib.request import urlopen

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
x = urlopen(url)
data = x.read()

我想自己创建列索引:

names = []
firstLine = True
for line in data:
    if firstLine:
        names = line.strip().split(';')
        firstLine = False

然而,名字的结果是['“'],这不是我预期的。

4 个答案:

答案 0 :(得分:2)

您可以使用pandas:

import pandas as pd
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
df = pd.read_csv(url, sep=';')

答案 1 :(得分:1)

当您读取文件(data = x.read()),然后获得二进制对象时,您只需将其解码为utf-8:

text = data.decode('utf-8')

然后你需要使用StringIO和csv来读取这个数据:

import csv, io
reader = csv.reader(io.StringIO(text), csv.excel)

现在读者只是一个包含文件数据的列表

答案 2 :(得分:1)

使用csv,可以按照以下方式完成:

from urllib.request import urlopen
import csv
import io

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
x = urlopen(url)
csv_data = x.read().decode('utf-8')
csv_input = csv.reader(io.StringIO(csv_data), delimiter=';')
header = next(csv_input)

print("Header is:", header)
data = list(csv_input)

# Display start of data
for row in data[:5]:
    print(row)

哪会给你:

Header is: ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol', 'quality']
['7.4', '0.7', '0', '1.9', '0.076', '11', '34', '0.9978', '3.51', '0.56', '9.4', '5']
['7.8', '0.88', '0', '2.6', '0.098', '25', '67', '0.9968', '3.2', '0.68', '9.8', '5']
['7.8', '0.76', '0.04', '2.3', '0.092', '15', '54', '0.997', '3.26', '0.65', '9.8', '5']
['11.2', '0.28', '0.56', '1.9', '0.075', '17', '60', '0.998', '3.16', '0.58', '9.8', '6']
['7.4', '0.7', '0', '1.9', '0.076', '11', '34', '0.9978', '3.51', '0.56', '9.4', '5']

答案 3 :(得分:0)

以下代码获取csv文件并将其保存到当前目录中

from urllib.request import urlretrieve

url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'

urlretrieve(url, 'wine.csv')

然后您可以在数据框中读取它

df = pd.read_csv('wine.csv', delimiter=';')