遍历URL时Python request.get返回ValueError

时间:2019-01-14 21:12:40

标签: python csv http python-requests

我正在编写脚本来执行以下操作:

  1. 提取一个CSV文件
  2. 浏览网址列中的值
  3. 为每个网址字段返回状态代码

我的数据来自我写的一个csv文件。网址字段包含一个带有1或2个网址的字符串。

CSV文件的结构如下:

run $(python -c "print('A'*268)").

我有一个可以正确遍历每一列的函数,但是当我尝试提取状态代码时,我得到了

id,site_id,url_check,js_pixel_json
12187,333304,"[""http://www.google.com"", ""http://www.facebook.com""]",[]
12187,333304,"[""http://www.google.com""]",[]

这是我的代码:

Traceback (most recent call last):
  File "help.py", line 29, in <module>
    loopUrl(inputReader)
  File "help.py", line 26, in loopUrl
    urlStatus = requests.get(url)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 498, in request
    prep = self.prepare_request(req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 441, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 309, in prepare
    self.prepare_url(url, params)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 375, in prepare_url
    scheme, auth, host, port, path, query, fragment = parse_url(url)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/url.py", line 185, in parse_url
    host, url = url.split(']', 1)
ValueError: not enough values to unpack (expected 2, got 1)

问题可以追溯到模块,我认为循环中发生了一些错误。

2 个答案:

答案 0 :(得分:1)

[“ http://www.google.com”,“ http://www.facebook.com”]是字符串,而不是列表。您要逐个字符地对其进行迭代,因此会出现上述错误。 您需要对列表进行安全评估,以获取URL列表而不是字符串。

示例:

>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

参考:Convert string representation of list to list

在您的代码中,它将是:

    urlList = ast.literal_eval(checkUrl) # not str(checkUrl)
    for url in urlList:
        urlStatus = requests.get(url)
    print(urlStatus.response_code)

答案 1 :(得分:0)

需要稍微清理一下,但应该可以帮助您

import requests 
import csv 
import ast


input = open('stackoverflow_help.csv')
inputReader = csv.reader(input)


def loopUrl(inputReader):
    pixelCheck = []
    for row in inputReader:
        if inputReader.line_num == 1:
            continue #skip first row

        checkUrl = row[2]
        try:
            checkUrl = ast.literal_eval(checkUrl)
        except:
            continue


        if checkUrl == []:
            continue
        elif checkUrl == 'NULL':
            continue

        for url in checkUrl:
            urlStatus = requests.get(url)
            print(urlStatus.status_code)

loopUrl(inputReader)

输出:

200
200
200