Python TypeError:预期的字符串或类似字节的对象

时间:2018-09-20 18:23:51

标签: python web-scraping python-requests

在尝试运行代码以从现有.CSV文件中在Amazon中刮取产品时,我一直收到此错误。以下是代码:

我在这里导入所需的模块

import re
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import sys
import warnings
from requests_html import HTMLSession
import io
from io import StringIO
from PIL import Image
from html.parser import HTMLParser

在这里我声明一个会话对象

session = HTMLSession()

#ignore warnings
if not sys.warnoptions:
    warnings.simplefilter("ignore")

url_array=[] #array for urls
asin_array=[] #array for asin numbers
with open('asin_list.csv', 'r') as csvfile:
    asin_reader = csv.reader(csvfile)
    for row in asin_reader:
        url_array.append(row[0]) #This url list is an array containing all the urls from the excel sheet

#The ASIN Number will be between the dp/ and another /
start = 'dp/'
end = '/'
for url in url_array:
    asin_array.append(url[url.find(start)+len(start):url.rfind(end)]) #this array has all the asin numbers

#declare the header.
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}

all_items=[] #The final 2D list containing prices and details of products, that will be converted to a consumable csv

for asin in asin_array:
    item_array=[] #An array to store details of a single product.
    amazon_url="https://www.amazon.com/dp/"+asin #The general structure of a url
    response = session.get(amazon_url, headers=headers, verify=False) #get the response

    item_array.append(response.html.search('a-color-price">${}<')[0]) #Extracting the price


    #Extracting the text containing the product details
    details = response.html


    details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))
    details_arr=[] #Declaring an array to store individual details
    details=re.sub("\n|\r", "", details) #Separate the details from text
    #details_arr=re.findall(r'\>(.*?)\<', details) #Store details in the array.

这是错误:

Traceback (most recent call last):
  File "C:/Users/xxx/prueba.py", line 54, in <module>
    details=re.sub("\n|\r", "", details) #Separate the details from text
  File "C:\Users\Usuario\AppData\Local\Programs\Python\Python37\lib\re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

感谢支持

1 个答案:

答案 0 :(得分:2)

很难确切知道您要搜索的内容,但是您的代码中有两行非常相似:

item_array.append(response.html.search('a-color-price">${}<')[0])

details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))

经过一番混乱之后,代码和测试页 https://www.amazon.com/dp/B01J6RPGKG/ref=nav_shopall_1_k_ods_tab_sz 上面提到的第一行代码返回价格, 第二个返回您的错误-仔细查看后,我认为您可能在该行中存在语法错误:

details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))

您可能想看一下这一行-特别是在结尾处,我认为括号放在错误的位置。 [0]))应该是)[0]) 这样可以解决该错误并提出一个新错误->

details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<')[0])
TypeError: 'NoneType' object is not subscriptable

我认为该错误发生在我身上,因为搜索没有为我返回任何内容。 为了进行故障排除,如果您应用该行,您已经说过这样的话:

更改此内容:

details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))

对此:

details= response.html.search('a-color-price">${}<')[0]

代码似乎可以正常工作。

所以我首先要说的问题是语法,其次是您要搜索的内容。

祝您程序顺利,希望对您有所帮助。