首先,我想枚举一个包含两个以上句子的文档,如下所示:
doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
list = []
for i in enumerate(doc1):
list.append(i)
对于每个句子,我发现情绪评分,然后我想通过取得平均分数将枚举文档与原始格式相结合。
任何答案都会受到高度赞赏吗?
doc2 = """I like movie. But I don't like the cast. The story is very nice"""
答案 0 :(得分:2)
我不确定我是否真的理解你的问题。
请注意,您的代码相当于:
doc = """I like movie. But I don't like the cast. The story is very nice"""
doc1 = doc.split('.')
result = list(enumerate(doc1))
(我使用了result
,因为list
会隐藏我用来构建列表的名称list
如果你把
doc = """I like movie. But I don't like the cast. The story is very nice"""
作为输入,您将获得result
值
result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice")]
注意字符串开头的空格。它可能是也可能不是你想要的。
如果您的问题是“如何根据结果重新创建初始字符串?”,下面是一个示例代码:
recreated_doc = ".".join(value for index, value in result)
请注意,如果您提供
doc = """I like movie. But I don't like the cast. The story is very nice."""
结尾逗号,你会得到:
result = [(0,"I like movie"),(1," But I don't like the cast"),(2," The story is very nice"),(3,"")]
但是,如果我想获得下一行呢?
result = [(0,"I like movie"),(1,"But I don't like the cast"),(2,"The story is very nice")]
(请注意,字符串的开头没有空格,也没有空字符串。)
以下是代码:
doc = """I like movie. But I don't like the cast. The story is very nice."""
doc1 = doc.split('.')
doc2 = (part.strip(' ') for part in doc1)
doc3 = (part for part in doc2 if len(part) > 0)
result = list(enumerate(doc3))
# result = [(0, 'I like movie'), (1, "But I don't like the cast"), (2, 'The story is very nice')]
并重新创建原始字符串:
recreated_doc = " ".join(value+"." for index, value in result)
# recreated_doc = """I like movie. But I don't like the cast. The story is very nice."""
警告,高级解决方案不会始终重新创建相同的原始文档,因此可能不行。
示例:
doc = """This a document with a lot of spaces. . Too much spaces here. And also here . ."""
# [...]
# recreated_doc = """This a document with a lot of spaces. Too much spaces here. And also here."""