将元组列表的列表转换为python中列表的列表元组

时间:2014-01-03 14:55:45

标签: python list tuples list-comprehension

我正在用nltk和Mallet编写一个NER标记程序。我必须在两种格式的输入数据之间进行转换,我无法改变。

数据基本上包含带有相关标签的单词,用于监督学习,但是将数据细分为句子,因此列出了列表。

第一种格式是

tuple(list(list(word)),list(list(tag))) 

,第二种格式是

list(list(tuple(word,tag))

目前我正在转换它(格式2 =>格式1):

([[tup[0] for tup in sent] for sent in train_set],
 [[tup[1] for tup in sent] for sent in train_set])

示例数据:

 [[('Steve','PERSON'),('runs','NONE'),('Apple','ORGANIZATION')],[('Today','NONE'),('is','NONE'),('June','DATETIME'),('27th','DATETIME')]]

和预期产出:

 ([['Steve', 'runs', 'Apple' ],['Today','is','June','27th']],
  [['PERSON','NONE','ORGANIZATION'],['NONE','NONE','DATETIME','DATETIME']])

我在两个方向进行转换

编辑:我不一定希望它更短 - 请在python 2.7(使用代码示例)中建议更好(也更可读)的方法。

3 个答案:

答案 0 :(得分:2)

list(list(tuple(word,tag))转换为tuple(list(list(word)),list(list(tag)))非常简单:

def convert(data_structure):
     sentences, tags = data_structure
     container = []
     for i in xrange(len(sentences)):
         container.append(zip(sentences[i], tags[i]))

     return container

如果您只是使用嵌套的for循环,转换到另一个方向的代码会稍长但不会很复杂:

def convert(data_structure):
    sentences = []
    tags = []

    for sentence in data_structure:
        sentence_words = []
        sentence_tags = []

        for word, tag in sentence:
            sentence_words.append(word)
            sentence_tags.append(tag)

        sentences.append(sentence_words)
        tags.append(sentence_tags)

    return (sentences, tags)

也许代码可以缩短,但一般原则应该是明确的,希望如此。

答案 1 :(得分:1)

您可以将内部元组转换为迭代器(使用iter),然后在嵌套列表解析中调用next

lis = [[('Steve','PERSON'),('runs','NONE'),('Apple','ORGANIZATION')],
       [('Today','NONE'),('is','NONE'),('June','DATETIME'),('27th','DATETIME')]]

it = [[iter(y) for y in x] for x in lis]
n = len(lis[0][0])  #Number of iterations required.
print [[[next(x) for x in i] for i in it] for _ in range(n)]

<强>输出:

[[['Steve', 'runs', 'Apple'], ['Today', 'is', 'June', '27th']],
 [['PERSON', 'NONE', 'ORGANIZATION'], ['NONE', 'NONE', 'DATETIME', 'DATETIME']]]

答案 2 :(得分:0)

我认为正确的解决方案将是这一个:

>>> data = [[('Steve','PERSON'),('runs','NONE'),('Apple','ORGANIZATION')],[('Today','NONE'),('is','NONE'),('June','DATETIME'),('27th','DATETIME')]]
>>> tuple([ map(list, (zip(*x))) for x in data ])
([['Steve', 'runs', 'Apple'], ['PERSON', 'NONE', 'ORGANIZATION']], [['Today', 'is', 'June', '27th'], ['NONE', 'NONE', 'DATETIME', 'DATETIME']])