Question

这将打印出所有行的数量：

def links(htmlfile):
    infile = open('twolinks.html', 'r')
    content = infile.readlines()
    infile.close()
    return len(content)
    print("# of lines: " + str(content.count('</a>')))

但我只需要最后包含< / a >的行数。

Answer 1

循环方式：

#box_e

使用理解：

with open('twolinks.html') as f:
    count = 0
    for line in f:
       if line.endswith('</a>'):
           count += 1

甚至更短（总结布尔值，将它们视为0和1）：

with open('twolinks.html') as f:
    sum( 1 for line in f if line.endswith('</a>') )

Answer 2

import re
with open('data') as f:
    print(sum( 1 for line in f if re.search('</a>',line) ))

Answer 3

num_lines = sum(1 for line in open('file') if '</a>' in line)
print num_lines

Answer 4

我想我的答案在代码行方面要长一些，但为什么不使用HTML解析器，因为你知道你在解析HTML？例如：

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.count = 0

    def handle_endtag(self, tag):
        if tag == "a":
            self.count += 1 
        print "Encountered an end tag :", tag
        print self.count

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
        '<body><h1>Parse me!</h1><a></a></body></html>')

这是来自python页面的修改代码。如果您发现需要收集其他标签或带有标签等的数据，则可以更容易地进行修改。

Answer 5

或者你可以这样做：

count = 0
f = open("file.txt", "r")
for line in f:
    if(line[-5:].rstrip('\n')=='</a>'):
        count+=1

对我来说很棒。

一般情况下，您每次都要遍历文件，并查看最后一个字符（没有\n）匹配</a>。看看\n条带化是否会给你带来麻烦。

如何使用Python打印包含特定单词的文件中的行数？

5 个答案: