Question

我有以下列表（此列表已被截断）：

[
  [['write', 1, 1], ['bob', 1, 1], ['econom', 1, 1], ['hate', 1, 1], ['articl', 1, 1], ['mcgwier', 1, 1], ['howev', 1, 1], ['terror', 1, 1]],
  [['polit', 2, 1], ['approach', 2, 1], ['correct', 2, 1], ['hate', 2, 1], ['effect', 2, 1], ['polici', 2, 1], ['stop', 2, 1], ['wors', 2, 1]], 
  [['support', 3, 1], ['organiz', 3, 1], ['directli', 3, 1], ['donat', 3, 1], ['right', 3, 1], ['indirectli', 3, 1], ['gay', 3, 1], ['issu', 3, 1]], 
  [['boycott', 4, 1], ['somebodi', 4, 1], ['appar', 4, 1], ['contradict', 4, 1], ['fund', 4, 1], ['end', 4, 1], ['reconcil', 4, 1], ['scout', 4, 1]], 
  [['road', 5, 1], ['saw', 5, 1], ['river', 5, 1], ['strom', 5, 1], ['research', 5, 1], ['mill', 5, 1], ['rob', 5, 1], ['ibm', 5, 1]],
  [['height', 6, 1], ['yorktown', 6, 1], ['p', 6, 1], ['box', 6, 1], ['ny', 6, 1]]
]

列表 ['write'，1，1] 中的第一个元素是'write'，而第二个'1'是位置，第三个是该术语在该位置'1'

的值

我还有另一个独特的术语列表：

[(1093, 'scout', 1), (661, 'issu', 1), (379, 'econom', 1), (1154, 'somebodi', 1), (395, 'end', 1), (57, 'appar', 1), (921, 'polici', 1), (247, 'contradict', 1), (1066, 'rob', 1), (62, 'approach', 1), (259, 'correct', 1), (1061, 'right', 1), (1377, 'write', 1), (1023, 'reconcil', 1), (1232, 'terror', 1), (1208, 'support', 1), (334, 'directli', 1), (75, 'articl', 1), (381, 'effect', 1), (624, 'indirectli', 1), (140, 'bob', 1), (502, 'fund', 1), (578, 'howev', 1), (1084, 'saw', 1), (1064, 'river', 1), (1383, 'yorktown', 1), (554, 'hate', 2), (864, 'organiz', 1), (839, 'ny', 1), (356, 'donat', 1), (560, 'height', 1), (874, 'p', 1), (1192, 'stop', 1), (1195, 'strom', 1), (145, 'boycott', 1), (1051, 'research', 1), (1372, 'wors', 1), (144, 'box', 1), (922, 'polit', 1), (1065, 'road', 1), (781, 'mill', 1), (586, 'ibm', 1), (513, 'gay', 1), (757, 'mcgwier', 1)]

第二个列表具有唯一项，其ID和值位于（1093，'scout'，1）结尾；其中 1093 是ID，'scout'是唯一术语， 1 是值。

这第二个列表是用于帮助的。

我想要一个这样的输出列表，它将包含第二个列表的所有唯一术语和ID，以及第一个列表中第一个列表中指定位置的值和第一个列表中的值：

[[1377,'write',[1,0,0,0,0,0]],[554, 'hate',[1,1,0,0,0,0]], ......]

[1377，'write'，[1,0,0,0,0,0]] 中的'1377'是第二个列表中的ID， “写” 是第二个列表中的术语， [1,0,0,0,0,0] 是第一个列表中相应位置的值。

在'write'中，您可以在第一个位置列表中看到'[''write'，1，1]'表示'write'的值为' 1”位于第一个位置，而“ write”则不在其他位置，因此在最终输出字符串中，“ write”的输出为 [1,0,0,0,0,0] [1377，'write'，[1,0,0,0,0,0]] 。

与“仇恨”相同，您可以在第一个位置列表中看到“ ['仇恨”，1，1]'表示“仇恨”在“ 1”处的值为“ 1” st位和第二个位置列表中的'['hate'，2，1]'表示'hate'在'2'nd位置的值为'1'而'hate'没有在其他任何地方，所以'hate'的输出在最终输出字符串 [554，'hate'，[1,1]中是 [1,1,0,0,0,0] ，0,0,0,0]] 。

请提供一些解决方案。

Answer 1

您是否需要此“帮助者列表”中的最后一个值？

如果没有，那么您可能会发现一些有用的代码。

import pandas as pd

n_pos = 6

lst1 = [
  [['write', 1, 1], ['bob', 1, 1], ['econom', 1, 1], ['hate', 1, 1], ['articl', 1, 1], ['mcgwier', 1, 1], ['howev', 1, 1], ['terror', 1, 1]],
  [['polit', 2, 1], ['approach', 2, 1], ['correct', 2, 1], ['hate', 2, 1], ['effect', 2, 1], ['polici', 2, 1], ['stop', 2, 1], ['wors', 2, 1]],
  [['support', 3, 1], ['organiz', 3, 1], ['directli', 3, 1], ['donat', 3, 1], ['right', 3, 1], ['indirectli', 3, 1], ['gay', 3, 1], ['issu', 3, 1]],
  [['boycott', 4, 1], ['somebodi', 4, 1], ['appar', 4, 1], ['contradict', 4, 1], ['fund', 4, 1], ['end', 4, 1], ['reconcil', 4, 1], ['scout', 4, 1]],
  [['road', 5, 1], ['saw', 5, 1], ['river', 5, 1], ['strom', 5, 1], ['research', 5, 1], ['mill', 5, 1], ['rob', 5, 1], ['ibm', 5, 1]],
  [['height', 6, 1], ['yorktown', 6, 1], ['p', 6, 1], ['box', 6, 1], ['ny', 6, 1]]
]

lst2 = [(1093, 'scout', 1), (661, 'issu', 1), (379, 'econom', 1), (1154, 'somebodi', 1), (395, 'end', 1), (57, 'appar', 1), (921, 'polici', 1), (247, 'contradict', 1), (1066, 'rob', 1), (62, 'approach', 1), (259, 'correct', 1), (1061, 'right', 1), (1377, 'write', 1), (1023, 'reconcil', 1), (1232, 'terror', 1), (1208, 'support', 1), (334, 'directli', 1), (75, 'articl', 1), (381, 'effect', 1), (624, 'indirectli', 1), (140, 'bob', 1), (502, 'fund', 1), (578, 'howev', 1), (1084, 'saw', 1), (1064, 'river', 1), (1383, 'yorktown', 1), (554, 'hate', 2), (864, 'organiz', 1), (839, 'ny', 1), (356, 'donat', 1), (560, 'height', 1), (874, 'p', 1), (1192, 'stop', 1), (1195, 'strom', 1), (145, 'boycott', 1), (1051, 'research', 1), (1372, 'wors', 1), (144, 'box', 1), (922, 'polit', 1), (1065, 'road', 1), (781, 'mill', 1), (586, 'ibm', 1), (513, 'gay', 1), (757, 'mcgwier', 1)]

# we know how many cols we expect
cols = list(range(1, n_pos+1))
# we create an index from the seconds list
index = pd.MultiIndex.from_tuples((lst2), names=['id', 'name', 'temp'])
# we create an empty dataframe and fill it with zeroes
df = pd.DataFrame(columns=cols, index=index).fillna(0)
# we drop this useless last element from second list
df.index = df.index.droplevel(2)


for lst in lst1:
  for el in lst:
    name, col, val = el
    # we dont know the id so slice(None) and the second index is the name
    # thats where we set ( = val) or add ( += val) to the existing value
    df.loc[(slice(None), name), col] += val

indexes = df.index.values.tolist()
values = df.values.tolist()

# we concatenate indexes and values to your desired output
desired_output = [[*idx, vals] for idx, vals in zip(indexes, values)]

哪个产量

[[1093, 'scout', [0, 0, 0, 1, 0, 0]], [661, 'issu', [0, 0, 1, 0, 0, 0]], [379, 'econom', [1, 0, 0, 0, 0, 0]], [1154, 'somebodi', [0, 0, 0, 1, 0, 0]], [395, 'end', [0, 0, 0, 1, 0, 0]], [57, 'appar', [0, 0, 0, 1, 0, 0]], [921, 'polici', [0, 1, 0, 0, 0, 0]], [247, 'contradict', [0, 0, 0, 1, 0, 0]], [1066, 'rob', [0, 0, 0, 0, 1, 0]], [62, 'approach', [0, 1, 0, 0, 0, 0]], [259, 'correct', [0, 1, 0, 0, 0, 0]], [1061, 'right', [0, 0, 1, 0, 0, 0]], [1377, 'write', [1, 0, 0, 0, 0, 0]], [1023, 'reconcil', [0, 0, 0, 1, 0, 0]], [1232, 'terror', [1, 0, 0, 0, 0, 0]], [1208, 'support', [0, 0, 1, 0, 0, 0]], [334, 'directli', [0, 0, 1, 0, 0, 0]], [75, 'articl', [1, 0, 0, 0, 0, 0]], [381, 'effect', [0, 1, 0, 0, 0, 0]], [624, 'indirectli', [0, 0, 1, 0, 0, 0]], [140, 'bob', [1, 0, 0, 0, 0, 0]], [502, 'fund', [0, 0, 0, 1, 0, 0]], [578, 'howev', [1, 0, 0, 0, 0, 0]], [1084, 'saw', [0, 0, 0, 0, 1, 0]], [1064, 'river', [0, 0, 0, 0, 1, 0]], [1383, 'yorktown', [0, 0, 0, 0, 0, 1]], [554, 'hate', [1, 1, 0, 0, 0, 0]], [864, 'organiz', [0, 0, 1, 0, 0, 0]], [839, 'ny', [0, 0, 0, 0, 0, 1]], [356, 'donat', [0, 0, 1, 0, 0, 0]], [560, 'height', [0, 0, 0, 0, 0, 1]], [874, 'p', [0, 0, 0, 0, 0, 1]], [1192, 'stop', [0, 1, 0, 0, 0, 0]], [1195, 'strom', [0, 0, 0, 0, 1, 0]], [145, 'boycott', [0, 0, 0, 1, 0, 0]], [1051, 'research', [0, 0, 0, 0, 1, 0]], [1372, 'wors', [0, 1, 0, 0, 0, 0]], [144, 'box', [0, 0, 0, 0, 0, 1]], [922, 'polit', [0, 1, 0, 0, 0, 0]], [1065, 'road', [0, 0, 0, 0, 1, 0]], [781, 'mill', [0, 0, 0, 0, 1, 0]], [586, 'ibm', [0, 0, 0, 0, 1, 0]], [513, 'gay', [0, 0, 1, 0, 0, 0]], [757, 'mcgwier', [1, 0, 0, 0, 0, 0]]]

Answer 2

首先，您可以将唯一值列表l2转换为字典以进行查找，然后使用defaultdict d来处理截断列表l1。最后，您可以将d转换为列表列表：

from collections import defaultdict
from itertools import chain

# l1 is your first truncated list
# l2 is your second list of unique terms

# create lookup dict
lookup = {term: {'id': id_, 'val': val} for id_, term, val in l2}

# create defaultdict with list of zeros
d = defaultdict(lambda: len(l1)*[0])

for term, pos, val in chain.from_iterable(l1):
    list_of_vals = d[(lookup[term]['id'], term)]
    list_of_vals[pos - 1] = val

# covert to list of lists
result = [(id_, term, list_) for (id_, term), list_ in d.items()]

如何将三元素字典的列表（第一元素：term，第二元素：position，Third：Value）转换为两元素字典的列表

2 个答案: