在尝试运行代码以从现有.CSV文件中在Amazon中刮取产品时,我一直收到此错误。以下是代码:
我在这里导入所需的模块
import re
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import sys
import warnings
from requests_html import HTMLSession
import io
from io import StringIO
from PIL import Image
from html.parser import HTMLParser
在这里我声明一个会话对象
session = HTMLSession()
#ignore warnings
if not sys.warnoptions:
warnings.simplefilter("ignore")
url_array=[] #array for urls
asin_array=[] #array for asin numbers
with open('asin_list.csv', 'r') as csvfile:
asin_reader = csv.reader(csvfile)
for row in asin_reader:
url_array.append(row[0]) #This url list is an array containing all the urls from the excel sheet
#The ASIN Number will be between the dp/ and another /
start = 'dp/'
end = '/'
for url in url_array:
asin_array.append(url[url.find(start)+len(start):url.rfind(end)]) #this array has all the asin numbers
#declare the header.
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
all_items=[] #The final 2D list containing prices and details of products, that will be converted to a consumable csv
for asin in asin_array:
item_array=[] #An array to store details of a single product.
amazon_url="https://www.amazon.com/dp/"+asin #The general structure of a url
response = session.get(amazon_url, headers=headers, verify=False) #get the response
item_array.append(response.html.search('a-color-price">${}<')[0]) #Extracting the price
#Extracting the text containing the product details
details = response.html
details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))
details_arr=[] #Declaring an array to store individual details
details=re.sub("\n|\r", "", details) #Separate the details from text
#details_arr=re.findall(r'\>(.*?)\<', details) #Store details in the array.
这是错误:
Traceback (most recent call last):
File "C:/Users/xxx/prueba.py", line 54, in <module>
details=re.sub("\n|\r", "", details) #Separate the details from text
File "C:\Users\Usuario\AppData\Local\Programs\Python\Python37\lib\re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
感谢支持
答案 0 :(得分:2)
很难确切知道您要搜索的内容,但是您的代码中有两行非常相似:
item_array.append(response.html.search('a-color-price">${}<')[0])
和
details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))
经过一番混乱之后,代码和测试页 https://www.amazon.com/dp/B01J6RPGKG/ref=nav_shopall_1_k_ods_tab_sz 上面提到的第一行代码返回价格, 第二个返回您的错误-仔细查看后,我认为您可能在该行中存在语法错误:
details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))
您可能想看一下这一行-特别是在结尾处,我认为括号放在错误的位置。 [0]))应该是)[0]) 这样可以解决该错误并提出一个新错误->
details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<')[0])
TypeError: 'NoneType' object is not subscriptable
我认为该错误发生在我身上,因为搜索没有为我返回任何内容。 为了进行故障排除,如果您应用该行,您已经说过这样的话:
更改此内容:
details=(response.html.search('P.when("ReplacementPartsBulletLoader").execute(function(module){ module.initializeDPX(); }){}</ul>;<'[0]))
对此:
details= response.html.search('a-color-price">${}<')[0]
代码似乎可以正常工作。
所以我首先要说的问题是语法,其次是您要搜索的内容。
祝您程序顺利,希望对您有所帮助。