re.sub清空列表

时间:2011-11-24 23:56:35

标签: python regex

def process_dialect_translation_rules():

    # Read in lines from the text file specified in sys.argv[1], stripping away
    # excess whitespace and discarding comments (lines that start with '##').
    f_lines = [line.strip() for line in open(sys.argv[1], 'r').readlines()]
    f_lines = filter(lambda line: not re.match(r'##', line), f_lines)

    # Remove any occurances of the pattern '\s*<=>\s*'. This leaves us with a 
    # list of lists. Each 2nd level list has two elements: the value to be 
    # translated from and the value to be translated to. Use the sub function
    # from the re module to get rid of those pesky asterisks.
    f_lines = [re.split(r'\s*<=>\s*', line) for line in f_lines]
    f_lines = [re.sub(r'"', '', elem) for elem in line for line in f_lines]

此函数应从文件中获取行并对行执行某些操作,例如删除以##开头的任何行。我希望执行的另一个操作是删除行中单词周围的引号。但是,当此脚本的最后一行运行时,f_lines变为空行。发生了什么事?

原始文件的请求行:

##  English-Geek Reversible Translation File #1
##   (Moderate Geek)
##  Created by Todd WAreham, October 2009

"TV show"    <=> "STAR TREK"
"food"       <=> "pizza"
"drink"      <=> "Red Bull"
"computer"   <=> "TRS 80"
"girlfriend" <=> "significant other"

2 个答案:

答案 0 :(得分:2)

在Python中,列表推导中的多个for循环从左到右处理 ,而不是从右到左处理,因此您的最后一个表达式应为:

[re.sub(r'"', '', elem) for line in f_lines for elem in line]

它不会导致错误,因为列表推导会泄漏循环变量,因此line仍然在前一个表达式的范围内。如果line那么是一个空字符串,则会得到一个空列表作为结果。

答案 1 :(得分:0)

你的基本问题是你选择了一种过于复杂的做事方式,然后就会失败。使用最简单的工具来完成工作。您不需要filter,map,lambda,readlines和所有这些列表推导(一个会做)。使用re.match而不是startswith是过度的。所以使用re.sub,其中str.replace可以完成这项工作。

with open(sys.argv[1]) as f:
    d = {}
    for line in f:
        line = line.strip()
        if not line: continue # empty line
        if line.startswith('##'): continue # comment line
        parts = line.split('<=>')
        assert len(parts) == 2 # or print an error message ...
        key, value = [part.strip('" ') for part in parts]
        assert key not in d # or print an error message ...
        d[key] = value

额外奖励:您可以检查狡猾的线条和重复的密钥。