基本结果

Question

首先，我想枚举一个包含两个以上句子的文档，如下所示：

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
list = []
for i in enumerate(doc1):
     list.append(i)

对于每个句子，我发现情绪评分，然后我想通过取得平均分数将枚举文档与原始格式相结合。

任何答案都会受到高度赞赏吗？

doc2 = """I like movie. But I don't like the cast. The story is very nice"""

Answer 1

我不确定我是否真的理解你的问题。

请注意，您的代码相当于：

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
result = list(enumerate(doc1))

（我使用了result，因为list会隐藏我用来构建列表的名称list

如果你把

doc = """I like movie. But I don't like the cast. The story is very nice"""

作为输入，您将获得result值

result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice")]

注意字符串开头的空格。它可能是也可能不是你想要的。

基本结果

如果您的问题是“如何根据结果重新创建初始字符串？”，下面是一个示例代码：

recreated_doc = ".".join(value for index, value in result)

高级答案

请注意，如果您提供

doc = """I like movie. But I don't like the cast. The story is very nice."""

结尾逗号，你会得到：

result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice"),(3,"")]

但是，如果我想获得下一行呢？

result = [(0,"I like movie"),(1,"But I don't like the cast"),(2,"The story is very nice")]

（请注意，字符串的开头没有空格，也没有空字符串。）

以下是代码：

doc = """I like movie. But I don't like the cast. The story is very nice."""
doc1 = doc.split('.')
doc2 = (part.strip(' ') for part in doc1)
doc3 = (part for part in doc2 if len(part) > 0)
result = list(enumerate(doc3))
# result = [(0, 'I like movie'), (1, "But I don't like the cast"), (2, 'The story is very nice')]

并重新创建原始字符串：

recreated_doc = " ".join(value+"." for index, value in result)
# recreated_doc = """I like movie. But I don't like the cast. The story is very nice."""

警告，高级解决方案不会始终重新创建相同的原始文档，因此可能不行。

示例：

doc = """This a document with a lot of spaces.   .   Too much spaces here.       And also here     .   ."""
# [...]
# recreated_doc = """This a document with a lot of spaces. Too much spaces here. And also here."""

如何使用python将枚举文档与原始文档相结合？

1 个答案:

基本结果

高级答案