如何用列表结构的原始列表重建列表

时间:2019-09-27 18:23:01

标签: python

我有一个原始的字符串列表。我拼凑了此列表,以对原始数据中的每个项目进行标签编码。标签编码后,我将标签压缩回单词,作为一个简单的元组列表。现在,我想将此元组列表转换回字符串列表结构的原始列表。下面的示例:

original_data = [[['hey how are you?'], ['I am fine, and you?'], ['I am fine, too.']], [["My name is Jason, what's your name?"], ['My name is Tina.'], ['Nice to meet you.'], ['Nice to meet you, too,']]]

flat_words = ['hey', 'how', 'are', 'you?', 'I', 'am', 'fine,', 'and', 'you?', 'I', 'am', 'fine,', 'too.', 'My', 'name', 'is', 'Jason,', "what's", 'your', 'name?', 'My', 'name', 'is', 'Tina.', 'Nice', 'to', 'meet', 'you.', 'Nice', 'to', 'meet', 'you,', 'too,']

labels = [9, 10, 7, 21, 0, 5, 8, 6, 21, 0, 5, 8, 17, 2, 13, 11, 1, 18, 22, 14, 2, 13, 11, 4, 3, 15, 12, 20, 3, 15, 12, 19, 16]

flat_words_with_labels = [('hey', 9), ('how', 10), ('are', 7), ('you?', 21), ('I', 0), ('am', 5), ('fine,', 8), ('and', 6), ('you?', 21), ('I', 0), ('am', 5), ('fine,', 8), ('too.', 17), ('My', 2), ('name', 13), ('is', 11), ('Jason,', 1), ("what's", 18), ('your', 22), ('name?', 14), ('My', 2), ('name', 13), ('is', 11), ('Tina.', 4), ('Nice', 3), ('to', 15), ('meet', 12), ('you.', 20), ('Nice', 3), ('to', 15), ('meet', 12), ('you,', 19), ('too,', 16)]

我想要的是:

final = [[[('hey', 9), ('how', 10), ('are', 7), ('you?', 21)], [('I', 0), ('am', 5), ('fine,', 8), ('and', 6), ('you?', 21)], [('I', 0), ('am', 5), ('fine,', 8), ('too.', 17)]], [[('My', 2), ('name', 13), ('is', 11), ('Jason,', 1), ("what's", 18), ('your', 22), ('name?', 14)], [('My', 2), ('name', 13), ('is', 11), ('Tina.', 4)], [('Nice', 3), ('to', 15), ('meet', 12), ('you.', 20)], [('Nice', 3), ('to', 15), ('meet', 12), ('you,', 19), ('too,', 16)]]]

3 个答案:

答案 0 :(得分:1)

一站式就可以了:

d = dict(flat_words_with_labels)
final = [[[(word, d[word]) for word in sentence[0].split()] for sentence in paragraph] for paragraph in original_data]

答案 1 :(得分:1)

这是一种看起来比较干净并且可以处理任何级别嵌套的方法。

original_data = [[['hey how are you?'], ['I am fine, and you?'], ['I am fine, too.']], [["My name is Jason, what's your name?"], ['My name is Tina.'], ['Nice to meet you.'], ['Nice to meet you, too,']]]

flat_words = ['hey', 'how', 'are', 'you?', 'I', 'am', 'fine,', 'and', 'you?', 'I', 'am', 'fine,', 'too.', 'My', 'name', 'is', 'Jason,', "what's", 'your', 'name?', 'My', 'name', 'is', 'Tina.', 'Nice', 'to', 'meet', 'you.', 'Nice', 'to', 'meet', 'you,', 'too,']

labels = [9, 10, 7, 21, 0, 5, 8, 6, 21, 0, 5, 8, 17, 2, 13, 11, 1, 18, 22, 14, 2, 13, 11, 4, 3, 15, 12, 20, 3, 15, 12, 19, 16]

mapping = {word: label for word, label in zip(flat_words, labels)}

def replace(lst, mapping):
    """
    Recursively go through lst and replace every `word`
    with the word and its mapping: (`word`: mapping[`word`])
    """
    for index, ele in enumerate(lst):
        if isinstance(ele, str):
            result = [(word, mapping[word]) for word in ele.split()]
            lst[:] = result
            break
        else:
            lst[index] = replace(ele, mapping)
    return lst
r = replace(original_data, mapping)
print(r)

结果:

[[[('hey', 9), ('how', 10), ('are', 7), ('you?', 21)], [('I', 0), ('am', 5), ('fine,', 8), ('and', 6), ('you?', 21)], [('I', 0), ('am', 5), ('fine,', 8), ('too.', 17)]], [[('My', 2), ('name', 13), ('is', 11), ('Jason,', 1), ("what's", 18), ('your', 22), ('name?', 14)], [('My', 2), ('name', 13), ('is', 11), ('Tina.', 4)], [('Nice', 3), ('to', 15), ('meet', 12), ('you.', 20)], [('Nice', 3), ('to', 15), ('meet', 12), ('you,', 19), ('too,', 16)]]]

答案 2 :(得分:0)

您可以重复使用original_data的结构,并将labels变成迭代器以构造final。我确定那里有一个更优雅的解决方案,但类似的方法可能有用:

labels_iter = iter(labels)

final = []

for convo in original_data:
    final.append([])
    for sent in convo:
        final[-1].append([])
        for word in sent[0].split(' '):
            final[-1][-1].append((word, next(labels_iter)))

final

出局:

[[[('hey', 9), ('how', 10), ('are', 7), ('you?', 21)],
  [('I', 0), ('am', 5), ('fine,', 8), ('and', 6), ('you?', 21)],
  [('I', 0), ('am', 5), ('fine,', 8), ('too.', 17)]],
 [[('My', 2),
   ('name', 13),
   ('is', 11),
   ('Jason,', 1),
   ("what's", 18),
   ('your', 22),
   ('name?', 14)],
  [('My', 2), ('name', 13), ('is', 11), ('Tina.', 4)],
  [('Nice', 3), ('to', 15), ('meet', 12), ('you.', 20)],
  [('Nice', 3), ('to', 15), ('meet', 12), ('you,', 19), ('too,', 16)]]]