从列表列表中创建排列数组

时间:2014-02-12 02:18:57

标签: python permutation itertools pydot

我有变量'actorslist'并且输出100行(每部电影一行):

[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler']
[u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine']
etc.

然后我有:

pairslist = list(itertools.permutations(actorslist, 2))

这给了我一对演员,但只在一个特定的电影中,然后在新的一行后,它转到下一部电影。如何让它输出一个大阵列中所有电影中的所有演员?这个想法是两个在一起看电影的演员应该得到一个pydot优势。

我输入了这个,成功输出到点文件,但没有输出正确的数据。

graph = pydot.Dot(graph_type='graph', charset="utf8")
for i in pairslist:
  edge = pydot.Edge(i[0], i[1])
  graph.add_edge(edge)
  graph.write('dotfile.dot')

我的预期输出应如下点文件(A,B)与(B,A)相同,因此输出中不存在:

"Tim Robbins" -- "Morgan Freeman";
"Tim Robbins" -- "Bob Gunton";
"Tim Robbins" -- "William Sadler";
"Morgan Freeman" -- "Bob Gunton";
"Morgan Freeman" -- "William Sadler";
"Bob Gunton" -- "William Sadler";
"Christian Bale" -- "Heath Ledger";
"Christian Bale" -- "Aaron Eckhart";
"Christian Bale" -- "Michael Caine";
"Heath Ledger" -- "Aaron Eckhart";
"Heath Ledger" -- "Michael Caine";
"Aaron Eckhart" -- "Michael Caine";

附加信息:

有些人对如何创建变量actorslist感兴趣:

file = open('input.txt','rU') ###input is JSON data on each line{"Title":"Shawshank...
nfile = codecs.open('output.txt','w','utf-8')
movie_actors = []
for line in file:
  line = line.rstrip()
  movie = json.loads(line)
  l = []
  title = movie['Title']
  actors = movie['Actors']
  tempactorslist = actors.split(',')
  actorslist = []
  for actor in tempactorslist:
    actor = actor.strip()
    actorslist.append(actor)
  l.append(title)
  l.append(actorslist)
  row = l[0] + '\t' + json.dumps(l[1]) + '\n'
  nfile.writelines(row)

3 个答案:

答案 0 :(得分:1)

from collections import Counter
from itertools import combinations
import pydot

actorslists = [
    [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
    [u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine'],
    [u'Tim Robbins', u'Heath Ledger', u'Michael Caine']
]

# Counter tracks how often each pair of actors has occurred (-> link weight)
actorpairs = Counter(pair for actorslist in actorslists for pair in combinations(sorted(actorslist), 2))

graph = pydot.Dot(graph_type='graph', charset="utf8")
for actors,weight in actorpairs.iteritems():   # or .items() for Python 3.x
    a,b = list(actors)
    edge = pydot.Edge(a, b, weight=str(weight))
    graph.add_edge(edge)
graph.write('dotfile.dot')

结果

enter image description here

答案 1 :(得分:0)

你会想要这样的东西:

import itertools

actorslist = [
    [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
    [u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine']
    ]

for movie in actorslist:
    for actor1, actor2 in itertools.permutations(movie, 2):
        print(actor1, actor2)
        # make edge, etc.

输出:

Tim Robbins Morgan Freeman
Tim Robbins Bob Gunton
Tim Robbins William Sadler
Morgan Freeman Tim Robbins
Morgan Freeman Bob Gunton
Morgan Freeman William Sadler
Bob Gunton Tim Robbins
Bob Gunton Morgan Freeman
Bob Gunton William Sadler
William Sadler Tim Robbins
William Sadler Morgan Freeman
William Sadler Bob Gunton
Christian Bale Heath Ledger
Christian Bale Aaron Eckhart
Christian Bale Michael Caine
Heath Ledger Christian Bale
Heath Ledger Aaron Eckhart
Heath Ledger Michael Caine
Aaron Eckhart Christian Bale
Aaron Eckhart Heath Ledger
Aaron Eckhart Michael Caine
Michael Caine Christian Bale
Michael Caine Heath Ledger
Michael Caine Aaron Eckhart

你现在所拥有的是置换电影列表,而不是每部电影中的演员列表。

答案 2 :(得分:0)

我不确定它需要多么复杂,但这似乎可以产生你的输出。我只改变了你的对线...(我冒昧地将Tim Robbins放入蝙蝠侠,只是为了让它更真实的重叠)

actorslist = [[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
  [u'Christian Bale', u'Heath Ledger', u'Tim Robbins', u'Michael Caine']]

import itertools
import pydot
graph = pydot.Dot(graph_type='graph', charset="utf8")

# generate a list of all unique actors, if you want that
# allactors = list(set([j for j in [i for i in actorslist]]))

# this is the key line -- you have to iterate through the list 
# and not try to permute the whole thing
pairs = [list(itertools.permutations(k, 2)) for k in actorslist]


for pair in pairs:
    for a,b in pair:
        edge = pydot.Edge(a,b)
        graph.add_edge(edge)
        graph.write('dotfile.dot')

输出文件(记得我改变了输入蒂姆罗宾斯)......

graph G {
charset=utf8;
"Tim Robbins" -- "Morgan Freeman";
"Tim Robbins" -- "Bob Gunton";
"Tim Robbins" -- "William Sadler";
"Morgan Freeman" -- "Tim Robbins";
"Morgan Freeman" -- "Bob Gunton";
"Morgan Freeman" -- "William Sadler";
"Bob Gunton" -- "Tim Robbins";
"Bob Gunton" -- "Morgan Freeman";
"Bob Gunton" -- "William Sadler";
"William Sadler" -- "Tim Robbins";
"William Sadler" -- "Morgan Freeman";
"William Sadler" -- "Bob Gunton";
"Christian Bale" -- "Heath Ledger";
"Christian Bale" -- "Tim Robbins";
"Christian Bale" -- "Michael Caine";
"Heath Ledger" -- "Christian Bale";
"Heath Ledger" -- "Tim Robbins";
"Heath Ledger" -- "Michael Caine";
"Tim Robbins" -- "Christian Bale";
"Tim Robbins" -- "Heath Ledger";
"Tim Robbins" -- "Michael Caine";
"Michael Caine" -- "Christian Bale";
"Michael Caine" -- "Heath Ledger";
"Michael Caine" -- "Tim Robbins";
}