Question

以下是一个文件的外观：

BEGIN_META
    stuff
    to
    discard
END_META
BEGIN_DB
    header
    to
    discard

    data I
    wish to
    extract
 END_DB

我希望能够将所有cat'的无限流解析在一起，从而排除re.findall('something useful', '\n'.join(sys.stdin), re.M)之类的内容。

以下是我的尝试，但我必须强制从get_raw_table()返回的生成器，因此它不太符合要求。删除力意味着您无法测试返回的生成器是否为空，因此您无法看到sys.stdin是否为空。

def get_raw_table(it):
    state = 'begin'
    for line in it:
        if line.startswith('BEGIN_DB'):
            state = 'discard'
        elif line.startswith('END_DB'):
            return
        elif state is 'discard' and not line.strip():
            state = 'take'
        elif state is 'take' and line:
            yield line.strip().strip('#').split()

# raw_tables is a list (per file) of lists (per row) of lists (per column)
raw_tables = []
while True:
    result = list(get_raw_table(sys.stdin))
    if result:
        raw_tables.append(result)
    else:
        break

Answer 1

这样的事可能有用：

import itertools

def chunks(it):
    while True:
        it = itertools.dropwhile(lambda x: 'BEGIN_DB' not in x, it)
        it = itertools.dropwhile(lambda x: x.strip(), it)
        next(it)
        yield itertools.takewhile(lambda x: 'END_DB' not in x, it)

例如：

src = """
BEGIN_META
    stuff
    to
    discard
END_META
BEGIN_DB
    header
    to
    discard

    1data I
    1wish to
    1extract
 END_DB


BEGIN_META
    stuff
    to
    discard
END_META
BEGIN_DB
    header
    to
    discard

    2data I
    2wish to
    2extract
 END_DB
"""


src = iter(src.splitlines())
for chunk in chunks(src):
    for line in chunk:
        print line.strip()
    print

Answer 2

您可以通过编程方式分离您的函数，使您的编程逻辑更有意义，并使您的代码更加模块化和灵活。尽量远离说

之类的东西

state = "some string"

因为如果将来你想要向这个模块添加一些东西会发生什么，那么你需要知道你的变量“状态”采用什么参数以及当它改变值时会发生什么。您无法保证记住这些信息，这可能会让您感到麻烦。编写函数来模仿这种行为更简洁，更容易实现。

def read_stdin():
    with sys.stdin as f:
        for line in f:
            yield line

def search_line_for_start_db(line):
    if "BEGIN DB" in line:
        search_db_for_info()

def search_db_for_info()
    while "END_DB" not in new_line: 
        new_line = read_line.next()
        if not new_line.strip():
            # Put your information somewhere
            raw_tables.append(line)

read_line = read_stdin()
raw_tables = []
while True:
    try:
        search_line_for_start_db(read_line.next())
    Except: #Your stdin stream has finished being read
        break #end your program

Lazy在Python中解析有状态的多行每记录数据流？

2 个答案: