python解析选项卡分隔文件

时间:2018-05-22 09:25:16

标签: python

python相当新的

我想用\ t分隔值解析文件,下面是图片。如何从文件中删除\ t并将值分隔为列? 代码如下。

import pandas as pd
import io
import requests
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt"
s = requests.get(url).content
df = pd.read_csv(io.StringIO(s.decode('utf-8')))

How it looks right now

How i want it to look

2 个答案:

答案 0 :(得分:1)

sep="\t"添加到pd.read_csv。数据很混乱,因此需要替换双标签:

df = pd.read_csv(
    io.StringIO(s.decode('utf-8').replace("\t\t", "\t")), 
    header=None, sep="\t")

答案 1 :(得分:1)

如果使用csv库是一个选项,你可以尝试:

import pandas as pd
import requests
import csv

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt"
raw_data = requests.get(url).content
file = open("raw_data.txt","w")
file.write(raw_data)
data = list(csv.reader(open('raw_data.txt', 'rb'), delimiter='\t'))
df = pd.DataFrame.from_records(data)
print df