我正在尝试从this website中获取一些显示器的价格。这是我的代码:
def noteBooksBillgerParser(url):
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url, headers=headers)
while True:
print ("test")
sleep(3)
try:
doc = html.fromstring(page.content)
XPATH_PRICE = '//div[@id="product_detail_price"]//content()'
RAW_PRICE = doc.xpath(XPATH_PRICE)
PRICE = ' '.join(''.join(RAW_PRICE).split()).strip() if RAW_PRICE else None
data = {
'PRICE': PRICE,
'URL': url,
}
return data
except Exception as e:
print
e
def ReadIDs():
# AsinList = csv.DictReader(open(os.path.join(os.path.dirname(__file__),"Asinfeed.csv")))
IDList = ['vp248qg',
'vz239he',
'vs247hr+164581'
]
extracted_data = []
for i in IDList:
url = "https://www.notebooksbilliger.de/asus+" + i
print("Processing: ", url)
extracted_data.append(noteBooksBillgerParser(url))
sleep(2)
f = open('notebooksbilliger.json', 'w')
json.dump(extracted_data, f, inde
nt=4)
我很确定大多数代码都可以正常工作,但是我不确定如何将Price放入我的XPATH_PRICE变量中。我认为可能有一个错误。
答案 0 :(得分:1)
并不是完全将“价格放入我的XPATH_PRICE变量中”,而是将其放入您的ROW_PRICE
变量中。如果您是
RAW_PRICE = doc.xpath('//div[@id="product_detail_price"]')[0].values()[4]
您的输出将是(随机选择IDList = vz239he
进行选择):
156.99
其余部分应按预期进行处理。