Question

我正在尝试从此网页中提取一些属性。 URL =＆＃39; http://m.search.allheart.com/?q=stethoscope＆＃39;

我为此写了以下xpath：

XPATH,ATTRIBUTE='XPATH','ATTRIBUTE'
NUM_RESULTS='NUM_RESULTS'
URL='URL'
TITLE='TITLE'
PROD_ID='PROD_ID'
IS_SALE='IS_SALE'
CURRENCY='CURRENCY'
REGULAR_PRICE='REGULAR_PRICE'
SALE_PRICE='SALE_PRICE'

conf_key={

NUM_RESULTS : {XPATH :'//div[@id="sort-page"]//div[@id="options" and @class="narrowed"]//hgroup[@id="sort-info" and @class="clearfix"]/h2', ATTRIBUTE:''} ,
URL : {XPATH:'//span[@class="info"]//span[@class="swatches clearfix product-colors"]//span[@class="price"]',ATTRIBUTE:'href'} ,
TITLE : {XPATH:'//div[@id="sort-results"]//li[@class="item product-box"]//span[@class="info"]//span[@class="title"]',ATTRIBUTE:''} ,
PROD_ID : {XPATH:'//div[@id="sort-results"]//li[@class="item product-box"]//span[@class="info"]//span[@class="swatches clearfix product-colors"]',ATTRIBUTE:'id'} ,
IS_SALE : {XPATH :'//div[@id="sort-results"]//li[@class="item product-box sale"]', ATTRIBUTE:''} ,
REGULAR_PRICE : {XPATH :'//div[@id="sort-results"]//li[@class="item product-box"]//span[@class="info"]//span[@class="price"]' , ATTRIBUTE:''} ,
SALE_PRICE : {XPATH :'//div[@id="sort-results"]//li[@class="item product-box sale"]//span[@class="info"]//span[@class="price"]' , ATTRIBUTE: '' } ,
}

 chromedriver = "/usr/local/CHROMEDRIVER"
 desired_capabilities=DesiredCapabilities.CHROME
 os.environ["webdriver.chrome.driver"] = chromedriver
 driver = webdriver.Chrome(chromedriver,desired_capabilities=desired_capabilities)
 driver.get(url)

我们的想法是从第一个搜索页面中提取属性，以获取名称，网址，标题，常规价格和价格。销售价格。

跳过剩下的代码..然后通过for循环提取文本。当我试图将物品出售时，

driver.find_elements_by_xpath(conf_key[SALE_PRICE][XPATH])
driver.find_elements_by_xpath(conf_key[REGULAR_PRICE][XPATH])

然而，这给了我，regular_price，sale_price，is_sale为 [＆＃39; $ 5.98＆＃39;，＆＃39; $ 5.98＆＃39;，＆＃39; $ 24.98＆＃39;，＆＃39; $ 3.98＆＃39;，＆＃39; $ 6.98＆＃39; ，＆＃39; $ 13.98＆＃39;，＆＃39; $ 24.98＆＃39;，＆＃39; $ 19.98＆＃39;，＆＃39; $ 18.98＆＃39;，＆＃39; $ 3.98＆＃39; ，＆＃39; $ 5.98＆＃39;，＆＃39; $ 24.98＆＃39;，＆＃39; $ 12.98＆＃39;，＆＃39; $ 24.98＆＃39;] [＆＃39; $ 49.99＆＃39; ;，＆＃39; 96.99美元＆＃39;] [1,1]

虽然我想 - ：

['$5.98', '$5.98', '$24.98','$49.99', '$3.98', '$6.98', '$13.98', '$24.98', '$19.98', '$18.98', '$3.98', '$5.98',  '$96.99', '$24.98', '$12.98', '$24.98']
['','', '24.98', '' , '' ....]
[0, 0, 1, 0 , 0 ...]

问题 - ： 我想强迫司机返回＆＃39;＆＃39; （或任何占位符），以便我可以发出产品未售出的信号。 该网页将具有类 - ：＆＃34; item product-box＆＃34; ，或＆＃34; item product-box-sale＆＃34;

另外，我不想硬编码，因为我需要为一组网页重复这个逻辑。如果不循环通过li [0]，li [1]等等，我怎样才能更好地做到这一点。当按顺序扫描时，是否存在任何表示该类不存在的方法？

使用上面定义的Xpath，我确实正确地得到了容器的其余部分 -

SEARCH_PAGE
244 Items ['ah426010', 'ahdst0100', 'ahdst0500blk', 'ahd000090', 'ahdst0600', 'pms1125', 'ahdst0400bke', 'ahdst0400blk', 'adc609', 'ma10448', 'ma10428', 'pm121', 'pm108', 'pm122']  ['allheart Discount Dual Head Stethoscope', 'allheart Discount Single Head Stethoscope', 'allheart Cardiology Stethoscope', 'allheart Disposable Stethoscope', 'allheart Discount Pediatric / Infant Stethoscope With Interchangeable Heads Stethoscope', 'Prestige Medical Ultra-Sensitive Dualhead Latex Free Stethoscope', 'allheart Smoke Black Edition Clinical Stainless Steel Stethoscope', 'allheart Clinical Stainless Steel Stethoscope', 'ADC Adscope-Lite 609 Lightweight Double-Sided Stethoscope', 'Mabis Dispos-A-Scope Nurse Stethoscope', 'Mabis Spectrum Nurse Stethoscope', 'Prestige Medical Clinical Lite Stethoscope', 'Prestige Medical Dual Head Stethoscope', 'Prestige Medical Sprague Rappaport Stethoscope']

我需要获得相同长度的列表，对应于每个列表，对于Regular＆amp;销售价格（和is_sale标志）

Answer 1

find_elements_by_X返回一个WebElements列表，每个WebElements都可以调用find_elements_by_X。

使用find_elements_by_X获取页面中所有产品的列表。
通过他们全部迭代
1. 使用find_elements_by_X（在当前产品上）获取cur_price或is_on_sale等特定元素。
2. 不要忘记初始化默认值。
3. 将信息存储在结构（map，class，tuple）中。请注意，使用__ init __（）

我发现css选择器比xpath IMO更容易阅读。尝试使用谷歌浏览器控制台（F12）+右击+复制CSS路径。 https://selenium-python.readthedocs.org/locating-elements.html#locating-elements-by-css-selectors

获取find_elements_by_xpath以在未找到元素时返回'none'或空字符串：使用selenium依次发送缺少元素的信号

1 个答案: