在python中将数据列表从url转换为csv

时间:2017-09-25 16:51:20

标签: python list csv dataframe

我试图将这个乳腺癌威斯康星州的数据集从列表转换为带有列的数据框。

这是数据集: http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

这些是列名:

   #  Attribute                     Domain
   -- -----------------------------------------
   1. Sample code number            id number
   2. Clump Thickness               1 - 10
   3. Uniformity of Cell Size       1 - 10
   4. Uniformity of Cell Shape      1 - 10
   5. Marginal Adhesion             1 - 10
   6. Single Epithelial Cell Size   1 - 10
   7. Bare Nuclei                   1 - 10
   8. Bland Chromatin               1 - 10
   9. Normal Nucleoli               1 - 10
  10. Mitoses                       1 - 10
  11. Class:                        (2 for benign, 4 for malignant)

我像这样将数据集导入python

导入请求

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)

print (f.text)

并以逗号列表查看数据:

1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2

我需要将逗号分隔为列并为列添加名称

我尝试了这个,但它没有工作

import requests
import pandas as pd
import io

urlData = requests.get(f.text).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

4 个答案:

答案 0 :(得分:0)

这将解决问题

import requests
import os

csvFile = open('c:\\users\\user\\desktop\\data.csv','w')
headers = 'sample','Clump Thickness','niformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'
r = requests.get("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data").text
csvFile.write(str(headers).replace("'",'').replace('(','').replace(')','') + "\n")
csvFile.write(r)
csvFile.close()

答案 1 :(得分:0)

以下对我有用:

import pandas as pd
import requests
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)
# separate each line
newf = f.text.splitlines()
# create pandas dataframe
df = pd.DataFrame([x.split(",") for x in newf])

答案 2 :(得分:-1)

import requests
import pandas as pd
import io

names = ['Sample code number',
         'Clump Thickness',
         'Uniformity of Cell Size',
         'Uniformity of Cell Shape',
         'Marginal Adhesion',
         'Single Epithelial Cell Size',
         'Bare Nuclei',
         'Bland Chromatin',
         'Normal Nucleoli',
         'Mitoses',
         'Class']

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
csv_text = requests.get(link).text
# if you don't care about column names omit names=names and do headers=None instead
df = pd.read_csv(io.StringIO(csv_text), names=names)

答案 3 :(得分:-1)

我肯定会想到一个更好的方法来做到这一点但....我已经将输出发送到带有静态标题行的csv。因为数据已经是","划界,我认为这将是最简单的方法。

import requests
import io

def main():
    outputFile = 'someName.csv'
    link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
    f = requests.get(link)
    headerLine = ("Sample code number(id number),Clump Thickness(1 - 10),Uniformity of Cell Size(1 - 10),Uniformity of Cell Shape(1 - 10),Marginal Adhesion(1 - 10),Single Epithelial Cell Size(1 - 10),Bare Nuclei(1 - 10),Bland Chromatin(1 - 10),Normal Nucleoli(1 - 10),Mitoses(1 - 10),Class:(2 for benign - 4 for malignant)")
    data =(f.text)
    try:
        with open(outputFile, "w+") as ofile:
            ofile.write(headerLine + '\n')
            ofile.write(data)
            print("Success") 
    except Exception as e:
        raise e

if __name__ == '__main__':
    main()