Question

我有一个v。大文本文档（没有文件扩展名），其中包含有关以下格式的每行上不同文件的信息：

VariableOne|VariableTwo|VaraibleThree

管道分隔不同的变量。 然而，在一些'VaraibleTwo'中可能还有管道。

我需要从文本文档中提取该信息，以便我可以操纵信息。例如：

Name = VariableOne From The Text Document
Middle Name(s) = VariableTwo From The Text Document
Last Name = VariableThree From The Text Document

这需要在 Python 3 中完成，其中包含六个变量，只有第二个变量包含管道。

感谢您提供任何帮助！

Answer 1

见Python string methods。具体来说，index和rindex可以为您提供所需内容：

line = 'first|middle|||stuff|end'

first_pipe = line.index('|')
last_pipe = line.rindex('|')

first = line[:first_pipe]
middle = line[first_pipe+1:last_pipe]
last = line[last_pipe+1:]

Answer 2

str.split获取要执行的最大拆分数的可选参数。还有str.rsplit，它是相同的但是向后“分裂”（如果你要对要执行的分割数设置限制，这只会产生差异。）

我们有6个值，第二个可能包含分隔符;因此，我们希望从前面分开1，从背后分开4个。

a, rest = data.split('|', 1)
b, c, d, e, f = rest.rsplit('|', 4)

Answer 3

怎么样：

>>> s = 'a|b|more b|yet more b|c|d|e|f'
>>> a, *b, c, d, e, f = s.split('|')
>>> b = '|'.join(b)
>>> 
>>> a,b,c,d,e,f
('a', 'b|more b|yet more b', 'c', 'd', 'e', 'f')

您可以通过切分拆分结果来替换显式命名，这可能会更加通用。至于阅读文件，通常的模式是

with open('somefile') as fp:
    for line in fp:
        a, *b, c, d, e, f = line.strip().split('|')
        b = '|'.join(b)
        # do something

它不会立即将整个文件读入内存，这对于大文件来说很方便。

更新：如果你必须遍历所有行，但只处理一行，那么你可以使用

with open('somefile') as fp:
    for i, line in enumerate(fp):
        if i == some_number:
            a, *b, c, d, e, f = line.strip().split('|')
            b = '|'.join(b)

另一方面，如果您只需提取和处理一行，则可以使用linecache module:

def proc(filename, lineno):
    line = linecache.getline(filename, lineno)
    a, *b, c, d, e, f = line.strip().split('|')
    b = '|'.join(b)
    # do something

虽然说实话，我认为我更喜欢Karl Knechtel的两个论证分裂方法，因为它更为通用。

Answer 4

您还可以使用str.partition()功能：

var1, pipe, _ = line.partition('|')
_, pipe, var3 = line.rpartition('|')
var2 = line[len(var1+pipe):-len(var3+pipe)]

example

或使用正则表达式：

import re

m = re.match(r'^([^|]*)\|(.*)\|([^|]*)$', line)

example

从文本文档中删除一行到不同的字符串

4 个答案: