我不确定我得到的这个错误。
追踪(最近一次呼叫最后一次):
文件“C:\ Users \ MICHAEL \ Desktop \ Project X \ dataprod.py”,第30行,中 status,response = http.request(quote_page)
文件“C:\ Program Files(x86)\ Python36-32 \ lib \ site-packages \ httplib2__init __。py”,第1368行,请求中 (scheme,authority,request_uri,defrag_uri)= urlnorm(uri)
文件“C:\ Program Files(x86)\ Python36-32 \ lib \ site-packages \ httplib2__init __。py”,第175行,在urlnorm中 (方案,权限,路径,查询,片段)= parse_uri(uri)
文件“C:\ Program Files(x86)\ Python36-32 \ lib \ site-packages \ httplib2__init __。py”,第171行,在parse_uri中 groups = URI.match(uri).group
TypeError:期望的字符串或类似字节的对象
我的代码如下,这可能是权限错误吗?我仍然是编码的新手,我对此道歉是一个新手错误和我骇人听闻的代码。基本上试图在我正在抓取的页面中找到链接。
import shelve
f = open("data.txt", 'w')
print("...")
from urllib.request import urlopen
from urllib.request import urlopen
from bs4 import BeautifulSoup, SoupStrainer
import httplib2
quote_page = ['https://www.auspost.com']
#ERROR BELOW
http = httplib2.Http()
status, response = http.request(quote_page)
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
print(link['href'])
info = []
for pg in quote_page:
page = urlopen(pg)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find('html')
name = name_box.text.strip()
info.append((name))
print("PULLED DATA")
import csv
from datetime import datetime
with open("index.csv", 'a', encoding='utf-8') as csv_file:
writer = csv.writer(csv_file)
for name in info:
writer.writerow([name])
f.write(name)
print(f, name)
Exit=input("Press '1' to save and close: ")
if Exit == 1:
f.close()
exit()
答案 0 :(得分:1)
尝试将其设为quote_page = 'https://www.auspost.com'
而不是括号。
编辑: 尝试更改此内容:
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
print(link['href'])
info = []
for pg in quote_page:
page = urlopen(pg)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find('html')
name = name_box.text.strip()
info.append((name))
print("PULLED DATA")
为:
quotes = []
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
quotes.append(link['href'])
info = []
for pg in quotes:
page = urlopen(pg)
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find('html')
name = name_box.text.strip()
info.append((name))
print("PULLED DATA")`