删除列表中单词末尾的\ n和后面的字母

时间:2009-12-27 17:37:34

标签: python string

如何删除\n及以下字母?非常感谢。

wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']
for x in wordlist:
    ...?

4 个答案:

答案 0 :(得分:4)

>>> import re
>>> wordlist = ['Schreiben\nEs', 'Schreiben', \
    'Schreiben\nEventuell', 'Schreiben\nHaruki']
>>> [ re.sub("\n.*", "", word) for word in wordlist ]
['Schreiben', 'Schreiben', 'Schreiben', 'Schreiben']

通过re.sub

完成
>>> help(re.sub)
  1 Help on function sub in module re:
  2 
  3 sub(pattern, repl, string, count=0)
  4     Return the string obtained by replacing the leftmost
  5     non-overlapping occurrences of the pattern in string by the
  6     replacement repl.  repl can be either a string or a callable;
  7     if a callable, it's passed the match object and must return
  8     a replacement string to be used.

答案 1 :(得分:3)

[w[:w.find('\n')] fow w in wordlist]
几个测试:

$ python -m timeit -s "wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[w[:w.find('\n')] for w in wordlist]"
100000 loops, best of 3: 2.03 usec per loop
$ python -m timeit -s "import re; wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[re.sub('\n.*', '', w) for w in wordlist]"
10000 loops, best of 3: 17.5 usec per loop
$ python -m timeit -s "import re; RE = re.compile('\n.*'); wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']" "[RE.sub('', w) for w in wordlist]"
100000 loops, best of 3: 6.76 usec per loop

修改

上述解决方案完全错误(参见Peter Hansen的评论)。在这里纠正的一个:

def truncate(words, s):
    for w in words:
        i = w.find(s)
        yield w[:i] if i != -1 else w

答案 2 :(得分:1)

您可以使用正则表达式来执行此操作:

import re
wordlist = [re.sub("\n.*", "", word) for word in wordlist]

正则表达式\n.*与第一个\n以及可能跟随的任何内容(.*)匹配,并将其替换为空。

答案 3 :(得分:0)

>>> wordlist = ['Schreiben\nEs', 'Schreiben', 'Schreiben\nEventuell', 'Schreiben\nHaruki']
>>> [ i.split("\n")[0] for i in wordlist ]
['Schreiben', 'Schreiben', 'Schreiben', 'Schreiben']