def process_dialect_translation_rules():
# Read in lines from the text file specified in sys.argv[1], stripping away
# excess whitespace and discarding comments (lines that start with '##').
f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()]
f_lines = filter(lambda line: not re.match(r'##', line), f_lines)
# Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a
# list of lists. Each 2nd level list has two elements: the value to be
# translated from and the value to be translated to. Use the sub function
# from the re module to get rid of those pesky asterisks.
f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines]
f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines]
此函数应从文件中获取行并对行执行某些操作,例如删除以##
开头的任何行。我希望执行的另一个操作是删除行中单词周围的引号。但是,当此脚本的最后一行运行时,f_lines
变为空行。发生了什么事?
原始文件的请求行:
## English-Geek Reversible Translation File #1
## (Moderate Geek)
## Created by Todd WAreham, October 2009
"TV show" <=> "STAR TREK"
"food" <=> "pizza"
"drink" <=> "Red Bull"
"computer" <=> "TRS 80"
"girlfriend" <=> "significant other"
答案 0 :(得分:2)
在Python中,列表推导中的多个for
循环从左到右处理 ,而不是从右到左处理,因此您的最后一个表达式应为:
[re.sub(r'"', '', elem) for line in f_lines for elem in line]
它不会导致错误,因为列表推导会泄漏循环变量,因此line
仍然在前一个表达式的范围内。如果line
那么是一个空字符串,则会得到一个空列表作为结果。
答案 1 :(得分:0)
你的基本问题是你选择了一种过于复杂的做事方式,然后就会失败。使用最简单的工具来完成工作。您不需要filter,map,lambda,readlines和所有这些列表推导(一个会做)。使用re.match而不是startswith是过度的。所以使用re.sub,其中str.replace可以完成这项工作。
with open(sys.argv[1]) as f:
d = {}
for line in f:
line = line.strip()
if not line: continue # empty line
if line.startswith('##'): continue # comment line
parts = line.split('<=>')
assert len(parts) == 2 # or print an error message ...
key, value = [part.strip('" ') for part in parts]
assert key not in d # or print an error message ...
d[key] = value
额外奖励:您可以检查狡猾的线条和重复的密钥。