Question

我收到了这个包含不同列号的巨大TXT文件。

jero, kor@gmail.com, 44d448e4d, team, 0, 6, 5, 2, s, s, s, none, none
jader, lda@gmail.com, d44a88x, team, 0, none, 48, 95, oled
etc for 15000 lines

我想在＆＃34;团队＆＃34;之后切断一切。单词，在每一行。我尝试了几个正则表达式，但无法成功。

谢谢！

Answer 1

没有必要使用正则表达式，有一个直接的解决方案。

with open('file.txt') as f:
    for line in f:
        i = line.split('team')[0] + "team"

Answer 2

好吧，如果您要解析CSV文件，请使用dedicated module：

import csv

for row in csv.reader(your_file, skipinitialspace=True):
    if 'team' in row:
        row = row[:row.index('team')+1]
    print ', '.join(row)

这样可以避免您在使用jero_team, kor@team.com, 44d448e4d, team, 0, one_more, team, 5, 2

等输入时遇到的麻烦

Answer 3

您不需要使用正则表达式。使用str.partition：

>>> line = 'jero, kor@gmail.com, 44d448e4d, team, 0, 6, 5, 2, s, s, s, none, none'
>>> a, sep, _ = line.partition('team')
>>> a
'jero, kor@gmail.com, 44d448e4d, '
>>> sep
'team'
>>> a + sep
'jero, kor@gmail.com, 44d448e4d, team'

with open('file.txt') as f:
    for line in f:
        a, sep, _ = line.partition('team')
        line = a + sep
        # Do something with line

<强>更新

要解决@DSM提到的问题：拆分包含team的其他字段：

with open('file.txt') as f:
    for line in f:
        a, sep, _ = line.partition(', team,')
        line = a + sep
        # Do something with line

Python - 解析文本 - 切换一个单词后的所有内容

3 个答案: