Question

我有一个包含所有学生的文本文件＆＃39;我的编程课程的名称和其他信息如下：

Smith, John sj0012@uni.edu smjo0012@student.edu Student  
Lester, Moe mole0025@uni.edu    mole0025@student.edu    Student
Christ, Jesus jech0020@uni.edu    jech@student.edu  Student

...

其中一些包含每行中文本之间的制表符和其他不必要的空格。因此，第一个电子邮件地址和第二个是标签。有时在这两者之间以及学生之间＃39;但我的目的只是制作一个新的文本文件，其中只包含一个很好的列中的Name，Lastname。我确实设法获得了我的结果，但只是继续将文本转换为列表并再次返回字符串。有没有更好的方法呢？ Python 2.7

peps = open('ppl.txt', 'r')

for line in peps.readlines():
    line = line.strip()                   # Delete space
    line = line.split('\t')               # Split at tab indentation and make a list
    line = map(lambda s: s.strip(), line) # Remove tab indentation
    del line [1:]                         # Delete everything after Name.
    line = ','.join(line)                 # Make Lastname, Name a string at ','
    line = line.split(',')                # Make Lastname, Name a list at ','
    line[0], line[-1] = line[-1], line[0] # Exchange position of Lastname, Name
    line = ', '.join(line)                # Convert to string again and join at ','
    print line

Answer 1

如果您正在尝试处理一个文件，其中每一行都是以逗号分隔的值列表，那正是csv模块的用途。

在您的更新版本中，看起来它们实际上是一个标签 - 分离的值列表......但这只是CSV的一种方言（称为TSV），该模块也可以处理细：

peps = open('ppl.txt', 'r')
reader = csv.reader(peps, delimiter='\t')
for row in reader:
    # here, row is a list of column values

您还可以使用csv.writer以CSV格式将行写回。如果要将这些行写入终端，甚至可以使用csv.writer(sys.stdout)。你永远不必处理分裂和加入;这一切都是为你照顾的。

但是，第一个列本身就是lastname, first，您还需要解析它。为此，我会使用str.split或str.partition（具体取决于您想要获得的行为，例如，雪儿在您的班级中）。我也不确定您是要拆分', '，还是拆分,然后删除空格。无论哪种方式都很容易。例如：

lastname, _, firstname = row[0].partition(',')
writer.writerow((firstname.strip(), lastname.strip()))

虽然我们正在使用它，但对文件使用with语句总是更好，所以我们也这样做。

但我的目的只是制作一个新文本文件，其中只包含一个很好的列中的Name，Lastname。

import csv
with open('ppl.txt') as infile, open('names.txt', 'w') as outfile:
    reader = csv.reader(infile, delimiter='\t')
    writer = csv.writer(outfile)
    for row in reader:
        lastname, _, firstname = row[0].partition(',')
        writer.writerow((firstname.strip(), lastname.strip()))

我不完全确定你的空间问题是什么。如果在某些情况下选项卡后面有空格而您想忽略它们，则应查看csv模块中的skipinitialspaces选项。例如：

reader = csv.reader(infile, skipinitialspaces=True)

但是如果在实际列的中间有选项卡和空格，并且您想要将它们删除，那么您可能希望使用str.replace或正则表达式。例如：

lastname, _, firstname = row[0].partition(',')
firstname = re.sub(r'\s', '', firstname)
lastname = re.sub(r'\s', '', lastname)
writer.writerow((firstname, lastname))

Answer 2

您可以使用正则表达式（'(\w+),\W+(\w+)'）从每行中获取Lastname，Name。

这样的事情：

import re
re.match('(\w+(?:-\w+)*),\W+(\w+(?:-\w+)*)', 'Lastname, Name, uniname@uni.edu, uniname@student.edu, Student/Teacher').groups()

从here获得帮助（针对带连字符的正则表达式）。

Answer 3

这里的其他答案肯定对你有用，但这是一种更简单的方法来完成你的任务：

# we can open both the input and output files at the same time
with open('ppl.txt', 'r') as fi, open('output.txt', 'w') as fo:
    for line in fi:
        split_line = line.split()
        fo.write("{0}, {1}\n".format(split_line[1], split_line[0].strip(',')))
        # if using Python 3, remove the numbers from the curly brackets

如果您不喜欢幻数，可以添加itemgetter模块：

import operator
retriever = operator.itemgetter(1, 0)

with open('ppl.txt', 'r') as fi, open('output.txt', 'w') as fo:
    for line in fi:
        f_name, l_name = retriever(line.split())
        fo.write("{0}, {1}\n".format(f_name, l_name.strip(',')))

为什么要继续来回转换？

3 个答案: