Question

Cosine，Dice，Jaccard这些算法中哪种算法最适合文本摘要？

Answer 1

它们是可以应用于两个文本的相似性度量。

Answer 2

seq2seq深度学习模型通常用于这种情况，这是一个blog series，从seq2seq的工作原理一直到最新的研究方法，都进行了详细的讨论

此repo还在构建文本摘要模型时收集了多种实现，它在google colab上运行这些模型，并将数据托管在google驱动器上，因此，无论您的计算机多么强大，都可以使用google colab这是一个免费的系统，可以在

上训练您的深层模型

如果您希望看到文本摘要的实际效果，可以使用此free api。

我真的希望这会有所帮助

Answer 3

提取摘要

提取摘要意味着识别文本的重要部分并逐字生成它们，从原始文本中生成句子的子集；而抽象摘要则是在使用先进的自然语言技术对文本进行解释和检查后，以一种新的方式再现重要材料，以生成一个新的较短文本，传达原始文本中最关键的信息。

这是模型从原始文本中识别出重要的句子和短语并仅输出它们的地方。

抽象总结

Abstractive Summarization 更先进，更接近于人性化的解释。尽管它具有更大的潜力（并且通常对研究人员和开发人员更感兴趣），但到目前为止，更传统的方法已被证明能产生更好的结果。

该模型生成一个完全不同的文本，比原始文本短，它以新形式生成新句子，就像人类一样。在本教程中，我们将使用转换器来实现这种方法。

reference_text = """Artificial intelligence (AI, also machine intelligence, MI) is intelligence demonstrated by machines, in contrast to the natural intelligence (NI) displayed by humans and other animals. In computer science AI research is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving". See glossary of artificial intelligence. The scope of AI is disputed: as machines become increasingly capable, tasks considered as requiring "intelligence" are often removed from the definition, a phenomenon known as the AI effect, leading to the quip "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from "artificial intelligence", having become a routine technology. Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go), autonomous cars, intelligent routing in content delivery networks, military simulations, and interpreting complex data, including images and videos. Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success and renewed funding. For most of its history, AI research has been divided into subfields that often fail to communicate with each other. These sub-fields are based on technical considerations, such as particular goals (e.g. "robotics" or "machine learning"), the use of particular tools ("logic" or "neural networks"), or deep philosophical differences. Subfields have also been based on social factors (particular institutions or the work of particular researchers). The traditional problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing, perception and the ability to move and manipulate objects. General intelligence is among the field's long-term goals. Approaches include statistical methods, computational intelligence, and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, neural networks and methods based on statistics, probability and economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy and many others. The field was founded on the claim that human intelligence "can be so precisely described that a machine can be made to simulate it". This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity. Some people also consider AI to be a danger to humanity if it progresses unabatedly. Others believe that AI, unlike previous technological revolutions, will create a risk of mass unemployment. In the twenty-first century, AI techniques have experienced a resurgence following concurrent advances in computer power, large amounts of data, and theoretical understanding; and AI techniques have become an essential part of the technology industry, helping to solve many challenging problems in computer science."""

抽象总结

len(reference_text.split())
from transformers import pipeline
summarization = pipeline("summarization")
abstractve_summarization = summarization(reference_text)[0]["summary_text"]

输出

In computer science AI research is defined as the study of "intelligent agents" Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" Capabilities generally classified as AI as of 2017 include successfully understanding human speech, competing at a high level in strategic game systems (such as chess and Go)

提取摘要

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer
parser = PlaintextParser.from_string(reference_text, Tokenizer("english"))
summarizer = LexRankSummarizer()
extractve_summarization  = summarizer(parser.document,2)
extractve_summarization = ' '.join([str(s) for s in list(extractve_summarization)])

提取输出

Colloquially, the term "artificial intelligence" is often used to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. Sub-fields have also been based on social factors (particular institutions or the work of particular researchers).The traditional problems (or goals) of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception and the ability to move and manipulate objects.

哪种算法最适合文本摘要？

3 个答案: