Question

我正在尝试使用scrapy和xpath从python中的某个网页中删除某些链接，但是我要删除的元素位于:: before和:: after之间，因此xpath无法看到它们，因为它们在HTML，但使用javascript动态创建。有没有办法清除这些元素？

::before
<div class="well-white">...</div>
<div class="well-white">...</div>
<div class="well-white">...</div>
::after

这是实际的页面http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/amif/calls/amif-2018-ag-inte.html#c,topics=callIdentifier/t/AMIF-2018-AG-INTE/1/1/1/default-group&callStatus/t/Forthcoming/1/1/0/default-group&callStatus/t/Open/1/1/0/default-group&callStatus/t/Closed/1/1/0/default-group&+identifier/desc

Answer 1

我无法复制您的确切文档状态。
但是，如果您加载页面，则可以看到以示例数据相同的格式加载了一些模板语言：

此外，如果您检查XHR网络检查器，则可以看到正在对JSON数据进行一些AJAX请求：

因此您可以在此处以方便的json格式下载所需的全部数据：

http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json

scrapy shell "http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json"
> import json
> data = json.loads(response.body_as_unicode())
> data['topicData']['Topics'][0]
{'topicId': 1259874, 'ccm2Id': 31081390, 'subCallId': 910867, ...

Answer 2

非常容易！您只需将“绝对XPath”和“相对XPath”（https://www.guru99.com/xpath-selenium.html）一起使用。通过此技巧，您可以通过:: before（也许::: after）形式。例如，在您的情况下（我认为： //div[@id='"+FindField+"'] // following :: td[@class='KKKK']在您的“ div”之前。

FindField='your "id" associated to the "div"'
driver.find_element_by_xpath ( "//div[@id='"+FindField+"']  // following :: td[@class='KKKK'] / div")

注意：只能使用一个“ /”。另外，您只能在所有寻址中使用“绝对XPath”（注意：必须在第一个地址处使用“ //”。

使用scrapy和xpath在:: before和:: after之间废弃HTML元素

2 个答案: