如何使用Pandas从Python中的Url读取数据?

时间:2019-01-28 14:49:11

标签: python pandas

我正在尝试从代码中提到的网址中读取文本数据。但这会引发错误:

  

ParserError:标记数据时出错。 C错误:第4行中应有1个字段,看到了2

url="https://cdn.upgrad.com/UpGrad/temp/d934844e-5182-4b58-b896-4ba2a499aa57/companies.txt"
c=pd.read_csv(url, encoding='utf-8')

1 个答案:

答案 0 :(得分:1)

df.read_csv()似乎有一些编码问题,它从未拆分代码:

#!/usr/bin/env python3
import requests
import pandas as pd
url = "https://cdn.upgrad.com/UpGrad/temp/d934844e-5182-4b58-b896-4ba2a499aa57/companies.txt"
r = requests.get(url)
df = None
if r.status_code == 200: 
    rows  = r.text.split('\r\n')
    header = rows[0].split('\t')
    data = []
    for n in range(1, len(rows)):
        cols = rows[n].split('\t')
        data.append(cols)
    df = pd.DataFrame(columns=header, data=data)
else:
    print("error: unable to load {}".format(url))
    sys.exit(-1)
print(df.shape)
print(df.head(2))

    $ ./test.py
(66369, 10)
                permalink      name            homepage_url                                      category_list     status country_code state_code      region           city  founded_at
0     /Organization/-Fame     #fame      http://livfame.com                                              Media  operating          IND         16      Mumbai         Mumbai
1  /Organization/-Qounter  :Qounter  http://www.qounter.com  Application Platforms|Real Time|Social Network...  operating          USA         DE  DE - Other  Delaware City  04-09-2014