以下是一个文件的外观:
BEGIN_META
stuff
to
discard
END_META
BEGIN_DB
header
to
discard
data I
wish to
extract
END_DB
我希望能够将所有cat
'的无限流解析在一起,从而排除re.findall('something useful', '\n'.join(sys.stdin), re.M)
之类的内容。
以下是我的尝试,但我必须强制从get_raw_table()
返回的生成器,因此它不太符合要求。删除力意味着您无法测试返回的生成器是否为空,因此您无法看到sys.stdin
是否为空。
def get_raw_table(it):
state = 'begin'
for line in it:
if line.startswith('BEGIN_DB'):
state = 'discard'
elif line.startswith('END_DB'):
return
elif state is 'discard' and not line.strip():
state = 'take'
elif state is 'take' and line:
yield line.strip().strip('#').split()
# raw_tables is a list (per file) of lists (per row) of lists (per column)
raw_tables = []
while True:
result = list(get_raw_table(sys.stdin))
if result:
raw_tables.append(result)
else:
break
答案 0 :(得分:4)
这样的事可能有用:
import itertools
def chunks(it):
while True:
it = itertools.dropwhile(lambda x: 'BEGIN_DB' not in x, it)
it = itertools.dropwhile(lambda x: x.strip(), it)
next(it)
yield itertools.takewhile(lambda x: 'END_DB' not in x, it)
例如:
src = """
BEGIN_META
stuff
to
discard
END_META
BEGIN_DB
header
to
discard
1data I
1wish to
1extract
END_DB
BEGIN_META
stuff
to
discard
END_META
BEGIN_DB
header
to
discard
2data I
2wish to
2extract
END_DB
"""
src = iter(src.splitlines())
for chunk in chunks(src):
for line in chunk:
print line.strip()
print
答案 1 :(得分:1)
您可以通过编程方式分离您的函数,使您的编程逻辑更有意义,并使您的代码更加模块化和灵活。尽量远离说
之类的东西state = "some string"
因为如果将来你想要向这个模块添加一些东西会发生什么,那么你需要知道你的变量“状态”采用什么参数以及当它改变值时会发生什么。您无法保证记住这些信息,这可能会让您感到麻烦。编写函数来模仿这种行为更简洁,更容易实现。
def read_stdin():
with sys.stdin as f:
for line in f:
yield line
def search_line_for_start_db(line):
if "BEGIN DB" in line:
search_db_for_info()
def search_db_for_info()
while "END_DB" not in new_line:
new_line = read_line.next()
if not new_line.strip():
# Put your information somewhere
raw_tables.append(line)
read_line = read_stdin()
raw_tables = []
while True:
try:
search_line_for_start_db(read_line.next())
Except: #Your stdin stream has finished being read
break #end your program