我正在编写脚本来执行以下操作:
我的数据来自我写的一个csv文件。网址字段包含一个带有1或2个网址的字符串。
CSV文件的结构如下:
run $(python -c "print('A'*268)").
我有一个可以正确遍历每一列的函数,但是当我尝试提取状态代码时,我得到了
id,site_id,url_check,js_pixel_json
12187,333304,"[""http://www.google.com"", ""http://www.facebook.com""]",[]
12187,333304,"[""http://www.google.com""]",[]
这是我的代码:
Traceback (most recent call last):
File "help.py", line 29, in <module>
loopUrl(inputReader)
File "help.py", line 26, in loopUrl
urlStatus = requests.get(url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 498, in request
prep = self.prepare_request(req)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 441, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 309, in prepare
self.prepare_url(url, params)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 375, in prepare_url
scheme, auth, host, port, path, query, fragment = parse_url(url)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/url.py", line 185, in parse_url
host, url = url.split(']', 1)
ValueError: not enough values to unpack (expected 2, got 1)
问题可以追溯到模块,我认为循环中发生了一些错误。
答案 0 :(得分:1)
[“ http://www.google.com”,“ http://www.facebook.com”]是字符串,而不是列表。您要逐个字符地对其进行迭代,因此会出现上述错误。 您需要对列表进行安全评估,以获取URL列表而不是字符串。
示例:
>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']
参考:Convert string representation of list to list
在您的代码中,它将是:
urlList = ast.literal_eval(checkUrl) # not str(checkUrl)
for url in urlList:
urlStatus = requests.get(url)
print(urlStatus.response_code)
答案 1 :(得分:0)
需要稍微清理一下,但应该可以帮助您
import requests
import csv
import ast
input = open('stackoverflow_help.csv')
inputReader = csv.reader(input)
def loopUrl(inputReader):
pixelCheck = []
for row in inputReader:
if inputReader.line_num == 1:
continue #skip first row
checkUrl = row[2]
try:
checkUrl = ast.literal_eval(checkUrl)
except:
continue
if checkUrl == []:
continue
elif checkUrl == 'NULL':
continue
for url in checkUrl:
urlStatus = requests.get(url)
print(urlStatus.status_code)
loopUrl(inputReader)
输出:
200
200
200