Question

我在此问题上收到的输出出现问题。基本上，我有一个文本文件（https://www.py4e.com/code3/mbox.txt），我尝试首先让python打印在其中找到多少个电子邮件地址，然后在随后的行中打印每个地址。我的输出示例如下：

Received: (from apache@localhost)

There were 22003 email addresses in mbox.txt
    for source@collab.sakaiproject.org; Thu, 18 Oct 2007 11:31:49 -0400

There were 22004 email addresses in mbox.txt

X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to zach.thomas@txstate.edu using -f

There were 22005 email addresses in mbox.txt

我在这里做错了什么？这是我的代码

fhand = open('mbox.txt')
count = 0
for line in fhand:
    line = line.rstrip()
    if '@' in line:
        count = count + 1
        print('There were', count, 'email addresses in mbox.txt')
    if '@' in line:
        print(line)

Answer 1

您能否更清楚地将预期输出与实际输出进行比较？

您有两个if '@' in line'语句应合并；没有理由问同样的问题两次。

计算包含@符号的行数，然后每行打印当前计数。

如果您只想打印一次计数，则将其放在for循环的外面（之后）。

如果您要打印电子邮件地址而不是包含它们的整行，那么您将需要执行更多字符串处理以从该行中提取电子邮件。

完成操作后，别忘了关闭文件。

Answer 2

以下内容修改了您的代码，以使用正则表达式在文本行中查找电子邮件。

import re

# Pattern for email 
# (see https://www.geeksforgeeks.org/extracting-email-addresses-using-regular-expressions-python/)

pattern = re.compile(r'\S+@\S+')

with open('mbox.txt') as fhand:
  emails = []
  for line in fhand:
      # Detect all emails in line using regex pattern
      found_emails = pattern.findall(line)
      if found_emails:
        emails.extend(found_emails)

print('There were', len(emails), 'email addresses in mbox.txt')
if emails:
  print(*emails, sep="\n")

输出

There were 44018 email addresses in mbox.txt
stephen.marquard@uct.ac.za
<postmaster@collab.sakaiproject.org>
<200801051412.m05ECIaH010327@nakamura.uits.iupui.edu>
<source@collab.sakaiproject.org>;
<source@collab.sakaiproject.org>;
<source@collab.sakaiproject.org>;
apache@localhost)
source@collab.sakaiproject.org;
stephen.marquard@uct.ac.za
source@collab.sakaiproject.org
....
....
...etc...

如何从python中的.txt文件计数和打印特定字符串？

2 个答案: