如何使用python将枚举文档与原始文档相结合?

时间:2018-02-04 09:20:56

标签: python text

首先,我想枚举一个包含两个以上句子的文档,如下所示:

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
list = []
for i in enumerate(doc1):
     list.append(i)

对于每个句子,我发现情绪评分,然后我想通过取得平均分数将枚举文档与原始格式相结合。

任何答案都会受到高度赞赏吗?

doc2 = """I like movie. But I don't like the cast. The story is very nice"""

1 个答案:

答案 0 :(得分:2)

我不确定我是否真的理解你的问题。

请注意,您的代码相当于:

doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
result = list(enumerate(doc1))

(我使用了result,因为list会隐藏我用来构建列表的名称list

如果你把

doc = """I like movie. But I don't like the cast. The story is very nice"""

作为输入,您将获得result

result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice")]
  

注意字符串开头的空格。它可能是也可能不是你想要的。

基本结果

如果您的问题是“如何根据结果重新创建初始字符串?”,下面是一个示例代码:

recreated_doc = ".".join(value for index, value in result)

高级答案

请注意,如果您提供

doc = """I like movie. But I don't like the cast. The story is very nice."""

结尾逗号,你会得到:

result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice"),(3,"")]

但是,如果我想获得下一行呢?

result = [(0,"I like movie"),(1,"But I don't like the cast"),(2,"The story is very nice")]

(请注意,字符串的开头没有空格,也没有空字符串。)

以下是代码:

doc = """I like movie. But I don't like the cast. The story is very nice."""
doc1 = doc.split('.')
doc2 = (part.strip(' ') for part in doc1)
doc3 = (part for part in doc2 if len(part) > 0)
result = list(enumerate(doc3))
# result = [(0, 'I like movie'), (1, "But I don't like the cast"), (2, 'The story is very nice')]

并重新创建原始字符串:

recreated_doc = " ".join(value+"." for index, value in result)
# recreated_doc = """I like movie. But I don't like the cast. The story is very nice."""
  

警告,高级解决方案不会始终重新创建相同的原始文档,因此可能不行。

示例:

doc = """This a document with a lot of spaces.   .   Too much spaces here.       And also here     .   ."""
# [...]
# recreated_doc = """This a document with a lot of spaces. Too much spaces here. And also here."""