在python中拆分多个单词

时间:2013-12-24 14:26:43

标签: python split

如何在python中编写一个可以拆分多个单词或字符的程序? 例如,我有这些句子:Hi, This is a test. Are you surprised?在这个例子中,我需要我的程序将这些句子分成',','!','?'和'。'。我知道在str库和NLTK中拆分,但我需要知道是否存在像split这样的内部pythonic方式?

4 个答案:

答案 0 :(得分:3)

使用re.split:

string = 'Hi, This is a test. Are you surprised?'
words = re.split('[,!?.]', string)
print(words)
[u'Hi', u' This is a test', u' Are you surprised', u'']

答案 1 :(得分:1)

您正在寻找NLTK包的tokenize功能。 NLTK代表自然语言工具包

或者从re.split模块中尝试re

来自re doc。

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
['0', '3', '9']

答案 2 :(得分:1)

我认为我的问题找到了一个棘手的方法。我不需要使用任何模块。我可以使用str库的replace方法,并将!?等字词替换为.。然后,我可以使用split方法将文字分割为.

答案 3 :(得分:0)

def get_words(s):
    l = []
    w = ''
    for c in s:
        if c in '-!?,. ':
            if w != '': 
                l.append(w)
            w = ''
        else:
            w = w + c
    if w != '': 
        l.append(w)
    return l



>>> s = "Hi, This is a test. Are you surprised?"
>>> print get_words(s)
['Hi', 'This', 'is', 'a', 'test', 'Are', 'you', 'surprised']


If you change '-!?,. ' into '-!?,.'
The output will be:
['Hi', ' This is a test', ' Are you surprised']