Question

我在python中编写一个脚本，其中包含以下字符串：

a = "write This is mango. write This is orange."

我想将此字符串分解为句子，然后将每个句子添加为列表项，以便它变为：

list = ['write This is mango.', 'write This is orange.']

我尝试过使用TextBlob但是没有正确读取它。（将整个字符串读作一个句子）。

有一种简单的方法吗？

Answer 1

一种方法是re.split 正向后视断言：

>>> import re
>>> a = "write This is mango. write This is orange."
>>> re.split(r'(?<=\w\.)\s', a)
['write This is mango.', 'write This is orange.']

如果您想在多个分隔符上拆分，请说出.和,，然后在断言中使用字符集：

>>> a = "write This is mango. write This is orange. This is guava, and not pear."
>>> re.split(r'(?<=\w[,\.])\s', a)
['write This is mango.', 'write This is orange.', 'This is guava,', 'and not pear.']

另外，您不应该使用list作为变量的名称，因为这将 shadow 内置list。

Answer 2

你应该查看用于python的NLTK。以下是NLTK.org的样本

>>> import nltk
>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
>>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',
'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
>>> tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
('Thursday', 'NNP'), ('morning', 'NN')]

对于你的情况，你可以做

import nltk
a = "write This is mango. write This is orange."
tokens = nltk.word_tokenize(a)

Answer 3

你知道string.split吗？它可以采用多字符拆分标准：

>>> "wer. wef. rgo.".split(". ")
['wer', 'wef', 'rgo.']

但它对白色空间的数量不太灵活。如果您无法控制完全停止后有多少空格，我建议使用正则表达式（＆＃34; import re＆＃34;）。就此而言，你可以分开＆＃34;。＆＃34;并清理每个句子前面的空格和最后一个＆＃34;之后的空列表。＆＃34;。

Answer 4

这应该有效。在这里查看.split（）函数：http://www.tutorialspoint.com/python/string_split.htm

 a = "write This is mango. write This is orange."

 print a.split('.', 1)

Answer 5

<code>a.split()</code>

a.split（）似乎是一种简单的方法，但最终会遇到问题。

例如假设你有

a = 'What is the price of the orange? \
It costs $1.39. \
Thank you! \
See you soon Mr. Meowgi.'

a.split（'。'）会返回：

a[0] = 'What is the price of the orange? It costs $1'
a[1] = '39'
a[2] = 'Thank you! See you soon Mr'
a[3] = 'Meowgi'

我也没考虑

代码段
- e.g。 '当我运行./sen_split函数时出现问题。 “a.str（”没有结束括号。“
公司的可能名称
- e.g。 “我为Node.js公司工作”
等

这最终归结为英语语法。我建议像Mike Tung指出的那样研究nltk模块。

如何根据fullstop'确定python中的句子'。'？

5 个答案: