Question

我从我之前的帖子http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0获得了How to set up XPath query for HTML parsing?的一些html代码，现在想要创建一个逻辑进程，因为其他许多页面都很相似，但并不完全相同。所以，

<div id="names">
<h2>Names and Synonyms</h2>
<div class="ds">
<button class="toggle1Col" title="Toggle display between 1 column of wider results and multiple columns.">&#8596;</button>
<h3>Name of Substance</h3>
<ul>
<li id="ds2"><div>Acetaldehyde</div></li>
</ul>
<h3>MeSH Heading</h3>
<ul>
<li id="ds3"><div>Acetaldehyde</div></li>
</ul>
</div>

现在在我的python脚本中，我想选择节点“物质名称”和“MeSH标题”并检查它们是否存在，如果存在，则选择其中的数据，否则返回空字符串。有没有办法在python中这样做，就像在Javascript中我会使用Node myNode = doc.DocumentNode.SelectNode（/ [text（）=“物质名称” /）？

from lxml import html
import requests 
import csv 
page = requests.get(http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0)
tree = html.fromstring(page.text) 

if( Name of substance is there )
    chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else
    chem_name = [] 
if ( MeSH Heading there )
    mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else 
    mesh_name = []

names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
    wr = csv.writer(myfile) 
    wr.writerow(names1)

Answer 1

您只需检查网页文字中是否有Name of Substance或MeSH Heading，如果是，则选择内容。

from lxml import html
import requests
import csv
page = requests.get('http://chem.sis.nlm.nih.gov/chemidplus/rn/75-07-0')
tree = html.fromstring(page.text)

if ("Name of Substance" in page.text):
    chem_name = tree.xpath('//*[text()="Name of Substance"]/..//div')[0].text_content()
else:
    chem_name = ""

if ("MeSH Heading" in page.text):
    mesh_name = tree.xpath('//*[text()="MeSH Heading"]/..//div')[1].text_content()
else:
    mesh_name = ""

names1 = [chem_name, mesh_name]
with open('testchem.csv', 'wb') as myfile:
    wr = csv.writer(myfile)
    wr.writerow(names1)

如何从lxml中选择html中的节点？

1 个答案: