TypeError:将httplib2与链接一起使用时的预期字符串或类字节对象

时间:2018-04-16 10:14:23

标签: python web-scraping

我不确定我得到的这个错误。

  

追踪(最近一次呼叫最后一次):

     

文件“C:\ Users \ MICHAEL \ Desktop \ Project X \ dataprod.py”,第30行,中       status,response = http.request(quote_page)

     

文件“C:\ Program Files(x86)\ Python36-32 \ lib \ site-packages \ httplib2__init __。py”,第1368行,请求中       (scheme,authority,request_uri,defrag_uri)= urlnorm(uri)

     

文件“C:\ Program Files(x86)\ Python36-32 \ lib \ site-packages \ httplib2__init __。py”,第175行,在urlnorm中       (方案,权限,路径,查询,片段)= parse_uri(uri)

     

文件“C:\ Program Files(x86)\ Python36-32 \ lib \ site-packages \ httplib2__init __。py”,第171行,在parse_uri中       groups = URI.match(uri).group

     

TypeError:期望的字符串或类似字节的对象

我的代码如下,这可能是权限错误吗?我仍然是编码的新手,我对此道歉是一个新手错误和我骇人听闻的代码。基本上试图在我正在抓取的页面中找到链接。

import shelve

f = open("data.txt", 'w')
print("...")

from urllib.request import urlopen

from urllib.request import urlopen
from bs4 import BeautifulSoup, SoupStrainer
import httplib2

quote_page = ['https://www.auspost.com']

#ERROR BELOW

http = httplib2.Http()
status, response = http.request(quote_page)

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

info = []
for pg in quote_page:

    page = urlopen(pg)

    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find('html')

    name = name_box.text.strip()

    info.append((name))

    print("PULLED DATA")

import csv
from datetime import datetime

with open("index.csv", 'a', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)

    for name in info:
        writer.writerow([name])
f.write(name)
print(f, name)


Exit=input("Press '1' to save and close: ")

if Exit == 1:
    f.close()
    exit()

1 个答案:

答案 0 :(得分:1)

尝试将其设为quote_page = 'https://www.auspost.com'而不是括号。

编辑: 尝试更改此内容:

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])    
info = []
for pg in quote_page:

    page = urlopen(pg)

    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find('html')

    name = name_box.text.strip()

    info.append((name))

    print("PULLED DATA")

为:

quotes = []
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        quotes.append(link['href'])

info = []
for pg in quotes:

    page = urlopen(pg)

    soup = BeautifulSoup(page, 'html.parser')

    name_box = soup.find('html')

    name = name_box.text.strip()

    info.append((name))

    print("PULLED DATA")`