Question

它不是一个正确的代码，但我想知道是否有一种方法可以使用.split（）搜索一个单词w./o，因为它形成一个列表，我不想用这个代码段：

f=(i for i in fin.xreadlines())
for i in f:
    try:
        match=re.search(r"([A-Z]+\b) | ([A-Z\'w]+\b) | (\b[A-Z]+\b) | (\b[A-Z\'w]+\b) | (.\w+\b)", i) # | r"[A-Z\'w]+\b" | r"\b[A-Z]+\b" | r"\b[A-Z\'w]+\b" | r".\w+\b"

我也可以像这样制作一个可重复使用的类模块

class LineReader: #Intended only to be used with for loop
    def __init__(self,filename):
        self.fin=open(filename,'r')
    def __getitem__(self,index):
        line=self.fin.xreadline()
        return line.split()

其中说f = LineReader（filepath）

和i中的f。 getitem （索引=行号25）循环从那里开始？我不知道该怎么做。有什么提示吗？

Answer 1

获得一行的第一个字：

line[:max(line.find(' '), 0) or None]

line.find(' ')搜索第一个空格，然后返回它。如果没有找到空格，则返回-1

max( ... ), 0)确保结果始终大于0，并使-1为0.这是有用的，因为bool（-1）为True且bool（0）为False。

如果x！= 0，则

x or None求值为x

和最终line[:None]等于line[:]，返回与line相同的字符串

第一个样本：

with open('file') as f:
    for line in f:
        word = line[:max(line.find(' '), 0) or None]
        if condition(word):
            do_something(word)

这个类（在这里实现为生成器）

def words(stream):
    for line in stream:
        yield line[:max(line.find(' '), 0) or None]

您可以使用

gen = words(f)
for word in gen:
    if condition(word):
        print word

或者

gen = words(f)
while 1:
    try:
        word = gen.next()
        if condition(word):
            print word
    except StopIteration:
        break # we reached the end

但你也想从某个亚麻布开始阅读。如果你不知道线的长度，这不能很有效。唯一的方法是阅读线条并丢弃它们，直到你找到合适的亚麻布。

def words(stream, start=-1): # you could replace the -1 with 0 and remove the +1
    for i in range(start+1): # it depend on whether you start counting with 0 or 1
        try:
            stream.next()
        except StopIteration:
            break
    for line in stream:
        yield line[:max(line.find(' '), 0) or None]

请注意，如果一行以空格开头，您可能会得到奇怪的结果。为了防止这种情况，您可以在循环的开头插入line = line.rstrip()。

免责声明：此代码均未经过测试

Python：每行读取一个文本文件

1 个答案: