我有一个包含'key'和'paragraph'的列表。每个“密钥”都与“段落”相关联。
我的目标是将每个段落分成单个句子,每个句子分配给他们最初属于段落形式的“键”。例如:
(['2925729', 'Patrick came outside and greeted us promptly.'], ['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.'], ['2925728', 'Patrick sucks. He farted politely, but it seemed routine.'])
现在我已经能够编写代码将句子分成段落,并根据字典获得每个句子的点击次数。我现在想要将ID与每个问题相关联。
以下是处理没有任何“密钥”的句子的代码。步骤1和2我省略了空间保护:
Dictionary = ['book', 'should have', 'open']
####Step3#####
#Create Blank list to append final output
final_out = []
##Find Matches
for sent in sentences:
for sent in sentences:
final_out.append((sent, sum(sent.count(col) for col in dictionary)))
#####Spit out final distinct output
##Output in dictionary structure
final_out = dict(sorted(set(final_out)))
####Get sentences and rank by max first
import operator
sorted_final_out = sorted(final_out.iteritems(),key = operator.itemgetter(1), reverse = True)
这个输出是: (['johny吃了羚羊',80],['sally有一个朋友',20]) 等等。然后我选择顶部X b量级。我现在想要实现的是这样的:([''12222','johny吃羚羊',80],[22332,'sally有一个朋友',20])。所以我基本上希望确保解析出的所有句子都分配给“密钥”。这很复杂抱歉。这就是为什么约翰的早期解决方案可以用于更简单的案例。
答案 0 :(得分:2)
from itertools import chain
list(chain(*[[[y[0],z] for z in y[1].split('. ')] for y in x]))
产生
[['2925729', 'Patrick came outside and greeted us promptly.'],
['2925729', 'Patrick did not shake our hands nor ask our names'],
['2925729', 'He greeted us promptly and politely, but it seemed routine.'],
['2925728', 'Patrick sucks'],
['2925728', 'He farted politely, but it seemed routine.']]
list(chain(*...))
展平由[[[y[0],z] for z in y[1].split('. ')] for y in x]
生成的嵌套列表。
如果你想更改“就地”,你可以使用
xl = list(x) # you gave us a tuple
for i,y in enumerate(xl):
xx = xl[i]
xx = [[xx[0],y] for y in xx[1].split('. ')]
xl[i:i+1] = xx
我不确定当数据集非常大时哪个会更快或更好。