我正在尝试从代码中提到的网址中读取文本数据。但这会引发错误:
ParserError:标记数据时出错。 C错误:第4行中应有1个字段,看到了2
url="https://cdn.upgrad.com/UpGrad/temp/d934844e-5182-4b58-b896-4ba2a499aa57/companies.txt"
c=pd.read_csv(url, encoding='utf-8')
答案 0 :(得分:1)
df.read_csv()似乎有一些编码问题,它从未拆分代码:
#!/usr/bin/env python3
import requests
import pandas as pd
url = "https://cdn.upgrad.com/UpGrad/temp/d934844e-5182-4b58-b896-4ba2a499aa57/companies.txt"
r = requests.get(url)
df = None
if r.status_code == 200:
rows = r.text.split('\r\n')
header = rows[0].split('\t')
data = []
for n in range(1, len(rows)):
cols = rows[n].split('\t')
data.append(cols)
df = pd.DataFrame(columns=header, data=data)
else:
print("error: unable to load {}".format(url))
sys.exit(-1)
print(df.shape)
print(df.head(2))
$ ./test.py
(66369, 10)
permalink name homepage_url category_list status country_code state_code region city founded_at
0 /Organization/-Fame #fame http://livfame.com Media operating IND 16 Mumbai Mumbai
1 /Organization/-Qounter :Qounter http://www.qounter.com Application Platforms|Real Time|Social Network... operating USA DE DE - Other Delaware City 04-09-2014