首先,简要说明问题:在无序列表中,我们有很多列表项,每个列表项都对应一个“抽认卡”
<ul>
<li>
<p><span>can you slice columns in a 2d list? </span></p>
<pre><code class='language-python' lang='python'>queryMatrixTranspose[a-1:b][i] = queryMatrix[i][a-1:b] </code></pre>
<ul>
<li>
<span>No: can't do this because python doesn't support multi-axis slicing, only multi-list slicing; see the article </span><a href='http://ilan.schnell-web.net/prog/slicing/' target='_blank' class='url'>http://ilan.schnell-web.net/prog/slicing/</a><span> for more info.</span>
</li>
</ul>
</li>
</ul>
抽认卡上的答案将始终是位于xpath下的列表项:/html/body/ul/li/ul
。我想以此处显示的格式检索答案
<li>
<span>No: can't do this because python doesn't support multi-axis slicing, only multi-list slicing; see the article </span><a href='http://ilan.schnell-web.net/prog/slicing/' target='_blank' class='url'>http://ilan.schnell-web.net/prog/slicing/</a><span> for more info.</span>
</li>
抽认卡的问题是提取答案后xpath:/html/body/ul/li
中保留的所有内容:
<li>
<p><span>can you slice columns in a 2d list? </span></p>
<pre><code class='language-python' lang='python'>queryMatrixTranspose[a-1:b][i] = queryMatrix[i][a-1:b] </code></pre>
</li>
对于抽认卡无序列表中的每个抽认卡,我想提取问题和答案列表项的utf-8
编码的html内容。也就是说,我想同时拥有text和html标签。
我试图通过遍历每个抽认卡和相应答案并从父节点抽认卡中删除子节点答案来解决此问题。
flashcard_list = []
htmlTree = html.fromstring(htmlString)
for flashcardTree,answerTree in zip(htmlTree.xpath("/html/body/ul/li"),
htmlTree.xpath('/html/body/ul/li/ul')):
flashcard = html.tostring(flashcardTree,
pretty_print=True).decode("utf-8")
answer = html.tostring(answerTree,
pretty_print=True).decode("utf-8")
question = html.tostring(flashcardTree.remove(answerTree),
pretty_print=True).decode("utf-8")
flashcard_list.append((question,answer))
但是,当我尝试使用flashcardTree.remove(answerTree)
删除答案子节点时,遇到了错误TypeError: Type 'NoneType' cannot be serialized.
,我不明白为什么该函数将不返回任何内容。我正在尝试删除/html/body/ul/li/ul
处的节点,该节点是/html/body/ul/li
的有效子节点。
无论您有什么建议,我们将不胜感激。我没有任何依附于我第一次尝试编写的代码;我将接受任何答案,其中输出是(问题,答案)元组的列表,每个抽认卡都有一个。
答案 0 :(得分:0)
如果我正确理解了您要寻找的东西,那么应该可以:
for flashcardTree,answerTree in zip(htmlTree.xpath("/html/body/ul/li/p/span"),
htmlTree.xpath('/html/body/ul/li/ul/li/descendant-or-self::*')):
question = flashcardTree.text
answer = answerTree.text_content().strip()
flashcard_list.append((question,answer))
for i in flashcard_list:
print(i[0],'\n',i[1])
输出:
您可以在二维列表中切片列吗?
否:无法执行此操作,因为python不支持多轴切片,仅支持多列表切片;有关更多信息,请参见文章http://ilan.schnell-web.net/prog/slicing/。