尝试仅打印“状态失败”,但Python正在打印所有内容

时间:2018-12-25 04:58:22

标签: python python-3.x

我正在遍历网页的HTML内容,并尝试仅打印子字符串为“状态失败”的字符串。但是,Python会打印每个字符串,即使没有子字符串“状态失败”的字符串也是如此。

这是我的代码:

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
    if "state failed" in link:
        if link.isoweekday() in range(1, 6):
            outF.write(str(link))
            outF.write('\n')   
outF.close()

这是我希望印刷的版本,

<rect class="state failed" data-original-title="Task_id: failure_cleanup&lt;br&gt;Run: 2018-12-22T04:00:00&lt;br&gt;Operator: CruxCleanupOperator&lt;br&gt;Started: 2018-12-24T18:34:39.149434&lt;br&gt;Ended: 2018-12-24T18:34:45.935977&lt;br&gt;Duration: 6.78654&lt;br&gt;State: failed&lt;br&gt;" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>

这是我不希望打印的内容,但出于某种奇怪的原因,它正在打印。

<rect class="state success" data-original-title="Task_id: join_cleanup&lt;br&gt;Run: 2018-12-22T04:00:00&lt;br&gt;Operator: CompletionBranchOperator&lt;br&gt;Started: 2018-12-24T18:33:30.834983&lt;br&gt;Ended: 2018-12-24T18:33:33.037330&lt;br&gt;Duration: 2.20235&lt;br&gt;State: success&lt;br&gt;" data-toggle="tooltip" height="10" rx="0" ry="0" style="shape-rendering: crispedges; stroke-width: 1; stroke-opacity: 1;" title="" width="10" x="984" y="-5"></rect>

我将所有组合都用单引号,双引号甚至三引号捆绑在一起。没关系。它会打印所有内容,甚至不包含“状态失败”的字符串。知道这里有什么问题吗?谢谢。

2 个答案:

答案 0 :(得分:1)

也许您可以尝试将link制成字符串:

soup = bs(html_page, 'lxml')
outF = open('C:/Users/ryans/OneDrive/Desktop/test.csv', 'w')
for link in soup.findAll('rect'):
    if "state failed" in str(link):
        if link.isoweekday() in range(1, 6):
            outF.write(str(link))
            outF.write('\n')   
outF.close()

然后它应该工作。

答案 1 :(得分:1)

if "state failed" in link:if "state failed" is link.get('class')代替if "state failed" == link.get('class')。我认为您最好使用is,因为您可能会因为None的{​​{1}}属性不存在而得到link.get('class')

您也可以这样:

class

Source