将Tweet div的全文作为一个返回值(包括链接文本)获取的xpath选择器是什么?
<div class="lead_table">
<table id="lead_table"style="width:100%">
<tr>
<th width="3%" id="i_d">ID</th>
<th width="35%" id="assessment">Assessment</th>
<th width="17%" id="risk_scale">Risk Scale<br>(1=Low Risk; 5=High Risk)</br></th>
<th width="5%" id="score">Score</th>
<th width="35%" id="notes">Explanation/Notes/Proposed Action</th>
</tr>
<tr>
<td align="center">L1</td>
<td>Question 1</td>
<td Scale 1</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L2</td>
<td>Question 2</td>
<td align="center">Scale 2</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L3</td>
<td>Question 3</td>
<td align="center">Scale 3</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L4</td>
<td>Question 4</td>
<td align="center">Scale 4</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<<td align="center">L5</td>
<td>Question 5</td>
<td align="center">Scale 5</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L6</td>
<td>Question 6</td>
<td align="center">Scale 6</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L7</td>
<td>Question 7</td>
<td align="center">Scale 7</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L8</td>
<td>Question 8</td>
<td align="center">Scale 8</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L9</td>
<td>Question 9</td>
<td align="center">Scale 9</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L10</td>
<td>Question 10</td>
<td align="center">Scale 10</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L11</td>
<td>Question 12</td>
<td align="center">Scale 12</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L12</td>
<td>Question 12</td>
<td align="center">Scale 12</td>
<td id="ans"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr>
<td align="center">L13</td>
<td>Question 13</td>
<td align="center">Scale 13</td>
<td id="ans13"><input type="number" min="1" max="5" style="text-align: center;"></td>
<td><input type="text"></td>
</tr>
<tr style="background-color:#4f81bd;">
<td colspan="3" align="left" style="color:white"><strong>Results</strong></td>
<td colspan="2" style="color:white" id="lead_res_num"><strong></strong></td>
</tr>
</table>
</div>
以上对没有链接的div有效,但是当推文包含链接时,它只返回第一个字符串段。
答案 0 :(得分:0)
以上对没有链接的div有效,但是当推文包含链接时,它只返回第一个字符串段。
这是因为/text()
部分 - 您基本上只匹配顶级文本子节点。要匹配元素内的所有文本节点,在任何级别,您都可以执行以下操作:
//*[contains(@class, 'tweet-text')][2]//text()
这通常是HTML解析器在询问&#34; text&#34;节点的值 - 它们递归地转到所有子节点并获得&#34;文本&#34;值 - 然后加入它们。
使用Python + lxml
解析器演示上述所有内容:
In [1]: from lxml.html import fromstring
In [2]: html = """
...: <div>
...: div text here
...: <a href="https://google.com">link text</a>
...: </div>"""
In [3]: root = fromstring(html)
In [4]: root.xpath('//div/text()') # <- No text of the a element
Out[4]: ['\n div text here\n ', '\n']
In [5]: root.xpath('//div//text()') # <- We've got all the texts now
Out[5]: ['\n div text here\n ', 'link text', '\n']
In [6]: root.xpath("//div")[0].text_content() # <- but this would that for us
Out[6]: '\n div text here\n link text\n'