Question

我有一个包含以下结构的列表;

[('0','927','928'),('2','693','694'),('2','742','743'),('2','776','777'),('2','804','805'),
('2','987','988'),('2','997','998'),('2','1019','1020'),
('2','1038','1039'),('2','1047','1048'),('2','1083','1084'),('2','659','660'),
('2','677','678'),('2','743','744'),('2','777','778'),('2','805','806'),('2','830','831')

第一个数字是一个id，第二个数字是一个单词的位置，第三个数字是第二个单词的位置。我需要做的就是努力寻找彼此相邻的词汇。

这些结果是针对3个单词的搜索而给出的，因此单词1的位置为单词2，单词2的位置为单词3.例如;

我运行短语查询"women in science"然后我获取上面列表中给出的值，因此('2','776','777')是'women in'的结果，('2','777','778')是'in science'的结果{1}}。

我需要找到一种方法来匹配这些结果，因此对于每个文档，它根据查询中的单词数量将单词组合在一起。（因此，如果查询中有4个单词，则需要将3个结果匹配在一起）。

这可能吗？

Answer 1

您需要根据其位置快速查找字词信息。创建一个由单词位置键入的字典：

# from your example; I wonder why you use strings and not numbers.
positions = [('0','927','928'),('2','693','694'),('2','742','743'),('2','776','777'),('2','804','805'),
('2','987','988'),('2','997','998'),('2','1019','1020'),
('2','1038','1039'),('2','1047','1048'),('2','1083','1084'),('2','659','660'),
('2','677','678'),('2','743','744'),('2','777','778'),('2','805','806'),('2','830','831')]

# create the dictionary
dict_by_position = {w_pos:(w_id, w_next) for (w_id, w_pos, w_next) in positions}

现在跟随连锁店是一块蛋糕：

>>> dict_by_position['776']
('2', '777')
>>> dict_by_position['777']
('2', '778')

或以编程方式：

def followChain(start, position_dict):
  result = []
  scanner = start
  while scanner in position_dict:
    next_item = position_dict[scanner]
    result.append(next_item)
    unused_id, scanner = next_item  # unpack the (id, next_position)
  return result

>>> followChain('776', dict_by_position)
[('2', '777'), ('2', '778')]

找到彼此不是子链的所有链：

seen_items = set()
for start in dict_by_position:
  if start not in seen_items:
    chain = followChain(start, dict_by_position)
    seen_items.update(set(chain))  # mark all pieces of chain as seen
    print chain  # or do something reasonable instead

Answer 2

以下将按照我的理解做你所要求的 - 它不是世界上最漂亮的输出，而且我认为如果可能的话你应该使用数字，如果数字就是你＆＃ 39;重新努力。可能有更优雅的解决方案，并且可以对此进行简化：

list.size() == 1

上面的输出是：

{＆＃39; 0＆＃39;：[＆＃39; 927＆＃39;，＆＃39; 928＆＃39;]，＆＃39; 2＆＃39;：[[＆＃39; 1019＆＃ 39;，＆＃39; 1020＆＃39;]，[＆＃39; 1038＆＃39;，＆＃39; 1039＆＃39;]，[＆＃39; 1047＆＃39;，＆＃39; 1048＆＃39 ;]，[＆＃39; 1083＆＃39;，＆＃39; 1084＆＃39;]，[＆＃39;＆＃39;＆＃39; 660＆＃39;]，[＆＃39; 677＆＃ 39;，＆＃39; 678＆＃39;]，[＆＃39; 693＆＃39;，＆＃39; 694＆＃39;]，[＆＃39; 742＆＃39;，＆＃39; 743＆＃39 ;，＆＃39; 744＆＃39;]，[＆＃39; 776＆＃39;，＆＃39; 777＆＃39;，＆＃39; 778＆＃39;]，[＆＃39; 804＆＃39; ，＆＃39; 805＆＃39;，＆＃39; 806＆＃39;]，[＆＃39; 830＆＃39;，＆＃39; 831＆＃39;]，[＆＃39; 987＆＃39;，＆＃39; 988＆＃39;]]}

匹配列表中的结果

2 个答案: