我在互联网上发现了以下python代码,但我不确定如何使它工作。作者在"在终端中导航到脚本路径并输入:
python aliexpresscrape.py Then type out the path to your file Path to File:/path/to/your/url/file"
我对输入aliexpress产品网址的位置感到有些困惑。
我从终端回来的一条消息是
Traceback (most recent call last):
File "aliexpresscrape.py", line 70, in <module>
read(selection)
File "aliexpresscrape.py", line 64, in read
with open(selection) as f:
IOError: [Errno 2] No such file or directory: '/Users/MyMacbookAir/Desktop/Songs\\ '
以下是代码:
from lxml import html
import lxml.html
import requests
import csv
from csv import writer
#variables
selection = raw_input("Path to File: ")
csv_header = ("post_title","post_name","ID","post_excerpt","post_content","post_status","menu_order","post_date","post_parent","post_author","comment_status","sku","downloadable","virtual","visibility","stock","stock_status","backorders","manage_stock","regular_price","sale_price","weight","length","width","height","tax_status","tax_class","upsell_ids","crosssell_ids","featured","sale_price_dates_from","sale_price_dates_to","download_limit","download_expiry","product_url","button_text","meta:_yoast_wpseo_focuskw","meta:_yoast_wpseo_title","meta:_yoast_wpseo_metadesc","meta:_yoast_wpseo_metakeywords","images","downloadable_files","tax:product_type","tax:product_cat","tax:product_tag","tax:product_shipping_class","meta:total_sales","attribute:pa_color","attribute_data:pa_color","attribute_default:pa_color","attribute:size","attribute_data:size","attribute_default:size")
#write header to output file (runs once)
with open('output.csv', 'w') as f:
writer=csv.writer(f)
writer.writerow(csv_header)
def scrape(url):
page = requests.get(url)
tree = html.fromstring(page.content)
title2 = str(lxml.html.parse(url).find(".//title").text)
title2 = title2.replace('-' + title2.split("-", 1)[1], '')
price = tree.xpath("//span[@itemprop='price']//text()")
i = 0
for span in tree.cssselect('span'):
clas = span.get('class')
rel = span.get('rel')
if clas == "packaging-des":
if rel != None:
if i == 0:
weight = rel
elif i == 1:
dim = str(rel)
i = i+1
weight = weight
height = dim.split("|", 3)[0]
length = dim.split("|", 3)[1]
width = dim.split("|", 3)[2]
#Sometimes aliexpress doesn't list a price
#This dumps a 0 into price in that case to stop the errors
if len(price) == 1:
price = float(str(price[0]))
elif len(price) == 0:
price = int(0)
for inpu in tree.cssselect('input'):
if inpu.get("id") == "hid-product-id":
sku = inpu.get('value')
for meta in tree.cssselect('meta'):
name = meta.get("name")
prop = meta.get("property")
content = meta.get('content')
if prop == 'og:image':
image = meta.get('content')
if name == 'keywords':
keywords = meta.get('content')
if name == 'description':
desc = meta.get('content')
listvar = ([str(title2),str(name), '', '', str(desc), 'publish', '', '', '0', '1', 'open', str(sku), 'no', 'no', 'visible', '', 'instock', 'no', 'no', str(price*2),str(price*1.5), str(weight), str(length), str(width), str(height), 'taxable', '', '', '', 'no', '', '', '', '', '', '', '', '', '', str(keywords), str(image), '', 'simple', '', '', '', '0', '', '', '', '', '', '', '', ''])
with open("output.csv",'ab') as f:
writer=csv.writer(f)
writer.writerow(listvar)
def read(selection):
lines = []
j = 0
with open(selection) as f:
for line in f:
lines.append(line)
lines = map(lambda s: s.strip(), lines)
for j in range(len(lines)):
scrape(str(lines[j]))
read(selection)
答案 0 :(得分:0)
不看脚本的文档(你能提供一个链接吗?)看起来你应该创建一个文件,其中包含你要浏览的页面的所有网址。例如,假设我创建了一个如下所示的文件:
https://www.aliexpress.com/category/100003070/men-clothing-accessories.html?spm=2114.11010108.102.1.jGW0U0
https://www.aliexpress.com/category/100003109/women-clothing-accessories.html?spm=2114.11010108.101.1.jGW0U0
https://www.aliexpress.com/category/509/phones-telecommunications.html?spm=2114.11010108.103.1.jGW0U0
并将其保存到C:\ali.txt
然后开始抓取这三个链接,我输入我的终端:
python aliexpresscrape.py
然后程序要做的第一件事就是要求输入文件名:
selection = raw_input("Path to File: ")
现在,您可以通过键入
来指定文件的路径C:\ali.txt
然后按Enter键。
然后,对于脚本在文本文件中找到的每一行,它将使用行内容(这是一个URL)调用scrape
并开始删除该URL。