从文本中提取句子而没有句号

时间:2019-02-05 16:16:31

标签: python machine-learning deep-learning nlp natural-language-processing

我是NLP的新手,需要帮助来从不包含句号的文本中提取句子。

例如:(下面的文字不包含句号)

global warming is the term used to describe a gradual increase in the average
temperature of the earths atmosphere and its oceans a change that is believed to 
be permanently changing the earths climate there is great debate among many 
people and sometimes in the news on whether global warming is real (some call it 
a hoax) but climate scientists looking at the data and facts agree the planet is 
warming while many view the effects of global warming to be more substantial and 
more rapidly occurring than others do the scientific consensus on climatic 
changes related to global warming is that the average temperature of the earth 
has risen between 04 and 08 °c over the past 100 years the increased volumes of 
carbon dioxide and other greenhouse gases released by the burning of fossil 
fuels land clearing agriculture and other human activities are believed to be 
the primary sources of the global warming that has occurred over the past 50 
years scientists from the intergovernmental panel on climate carrying out global 
warming research have recently predicted that average global temperatures could 
increase between 14 and 58 °c by the year 2100 changes resulting from global 
warming may include rising sea levels due to the melting of the polar ice caps 
as well as an increase in occurrence and severity of storms and other severe 
weather events

有没有可以用来从上面提取句子的NLP库。

预期输出:

global warming is the term used to describe a gradual increase in the average temperature of the earths atmosphere and its oceans a change that is believed to be permanently changing the earths climate
there is great debate among many people and sometimes in the news on whether global warming is real (some call it a hoax)
but climate scientists looking at the data and facts agree the planet is warming
while many view the effects of global warming to be more substantial and more rapidly occurring than others do the scientific consensus on climatic changes related to global warming is that the average temperature of the earth has risen between 4 and 8 °c over the past 100 years
the increased volumes of carbon dioxide and other greenhouse gases released by the burning of fossil fuels land clearing agriculture and other human activities are believed to be the primary sources of the global warming that has occurred over the past 50 years
scientists from the intergovernmental panel on climate carrying out global warming research have recently predicted that average global temperatures could increase between 4 and 8 °c by the year 2100
changes resulting from global warming may include rising sea levels due to the melting of the polar ice caps as well as an increase in occurrence and severity of storms and other severe weather events

谢谢。

1 个答案:

答案 0 :(得分:0)

实际上,了解句子在何处开始和结束是一项相对昂贵的操作。对于您的情况,您可以列出一个不会以句子开头的单词列表,例如“地球”和“政府间”,将您的文本拆分为单词,检测大写单词(这些单词将以句子开头),然后将子列表加入字符串。这是我的操作方式:

a = """
Global warming is the term used to describe a gradual increase in the average 
temperature of the Earths atmosphere and its oceans a change that is believed
to be permanently changing the Earths climate There is great debate among many
people and sometimes in the news on whether global warming is real (some call 
it a hoax) But climate scientists looking at the data and facts agree the 
planet is warming While many view the effects of global warming to be more 
substantial and more rapidly occurring than others do the scientific consensus 
on climatic changes related to global warming is that the average temperature 
of the Earth has risen between 4 and 8 °C over the past 100 years The 
increased volumes of carbon dioxide and other greenhouse gases released by the 
burning of fossil fuels land clearing agriculture and other human activities 
are believed to be the primary sources of the global warming that has occurred
over the past 50 years Scientists from the Intergovernmental Panel on Climate 
carrying out global warming research have recently predicted that average 
global temperatures could increase between 4 and 8 °C by the year 2100 
Changes resulting from global warming may include rising sea levels due to the 
melting of the polar ice caps as well as an increase in occurrence and severity 
of storms and other severe weather events
"""

preProc = a.replace("\n", "")
preProc = preProc.split(" ")

capitalizedWords = ['Intergovernmental','Panel', 'Climate', 'Earths', 'Earth', '°C']
results = []
previousIndex = 0
for idx, word in enumerate(preProc):
  if word.istitle() and word not in capitalizedWords:
    results.append(list(preProc[previousIndex:idx]))
    previousIndex = idx
results = [" ".join(x) for x in results]

print(results)