Question

所以我编写了一个程序来搜索Reg Ex声明，电子邮件，电话号码和图像中的项目。不用多久我就开始学习python了。

我用来抓取网站的代码是：

def main():
    url = "URL in here!"
    webpage = urllib2.urlopen(url)
    content = webpage.read()
    f = open('CSN08115-TestPage.txt', 'w')
    f.write(content)
    f.close()
    print content
    print GetLink()

def GetLink():
    with open('CSN08115-TestPage.txt') as f: 
        for line in f: 
            c = re.findall(r'a\shref="/?(.*)">', line)
            #Code to find total number of Lines of c
            if c:
                print c, 'Total number of emails: 6' #Output should adjust to different websites

if __name__ == "__main__":
main()

我的问题是我如何计算RegEx声明中的输出总数

我尝试过使用print c, len(c)，但这只会在每个输出旁边输出1！共有6封电子邮件。我的想法背后是c = re.findall为c中找到的每封电子邮件创建一个列表，然后每封电子邮件给出1的结果？

Answer 1

没有看到输入我不能肯定，但我怀疑你应该在整个页面内容上调用re.findall而不是一次一行：

   ...
   content = webpage.read()
   ...
   c = re.findall(r'a\shref="/?(.*)">', content)
   number_of_items = len(c)

如何计算正则表达式中的项目总数

1 个答案: