Question

我有一个文本列表，其内容与此类似：这些都在我的文本文件中的单独行上

Email: jonsmith@emailaddie.com 
Name: Jon Smith
Phone Number: 555-1212
Email: jonsmith@emailaddie.com
Name: Jon Smith
Phone Number: 555-1212
Email: jonsmith@emailaddie.com
Name: Jon Smith
Phone Number: 555-1212

我正在尝试使用以下组：[电子邮件，姓名，电话]组合，并导出为另一个文本文件，每个组都放在单独的行上。

这是到目前为止我尝试过的操作：（如果我可以正确地将其打印到终端，则我知道如何写入另一个文件。

我正在运行Ubuntu Linux

import re

stuff = list()

#get line
with open("a2.txt", "r") as ins:
    array = []
    for line in ins:
        if re.match("Email Address: ", line):
            array.append(line)
            if re.match("Phone Number: ", line):
                array.append(line)
                if re.match("Name: ", line):
                    array.append(line)
                    print(line)

Answer 1

如注释中所示，您正在通过嵌套的if语句查看同一行。样本中没有一行与所有三个正则表达式匹配，因此代码永远不会提取任何内容。无论如何，这里不需要使用正则表达式。简单的line.startswith()足以查找单个静态字符串或少量静态字符串。

相反，您要

array = []
for line in ins:
    if line.startswith('Email Address:'):
        array.append(<<Capture the rest of the line>>)
    elif line.startswith('Name: '):
        array.append(<<Capture the rest of the line>>)
    elif line.startswith('Phone Number: '):
        array.append(<<Capture the rest of the line>>)
        print(array)
        array = []

如果行总是完全相同的顺序，则这种简单的结构就足够了。如果您必须解决缺少可选行或混合顺序的问题，则该程序将需要稍微复杂一些。

您会注意到，此代码（带有部分伪代码）仍然是重复的。您想避免重复自己，所以稍微好一点的程序可能会按顺序遍历预期的短语。

fields = ('Email Address: ', 'Name: ' , 'Phone Number: ')
index = 0
array = []
for line in ins:
     if line.startswith(fields[index]):
        array[index] = line[len(fields[index]):-1]
     else:
        raise ValueError('Expected {0}, got {1}'.format(field[index], line))
    index += 1
    if index >= len(fields):
        print(array)
        array = []
        index = 0

乍一看这有点难，但是您应该很快就能理解它。我们有一个index，它告诉我们期望从fields中获得什么价值，并打印收集的信息，并在index用完时将fields包装回零。这也方便地让我们引用期望的字符串的长度，当我们从行中提取子字符串后，我们需要它的长度。（-1摆脱了在我们阅读的每一行末尾都存在的换行符。）

Answer 2

如果您确定参数（电子邮件，姓名和电话号码）将以给定的相同顺序出现，则代码将正常工作，否则请在“ else”语句中进行处理。您可以保存不完整的值或引发异常。

with open("path to the file") as fh:
# Track the status for a group
counter = 0
# List of all information
all_info = []
# Containing information of current group
current_info = ""
for line in fh:
    if line.startswith("Email:") and counter == 0:
        counter = 1
        current_info = "{}{},".format(current_info, line)
    elif line.startswith("Name:") and counter == 1:
        counter = 2
        current_info = "{}{},".format(current_info, line)
    elif line.startswith("Phone Number:") and counter == 2:
        counter = 0
        all_info.append("{}{},".format(current_info, line).replace("\n",""))
        current_info = ""
    else:
        # You can handle incomplete information here.
        counter = 0
        current_info = ""

我正在寻求有关完成此项目的指导

2 个答案: