Graphviz:从数据帧中对数据进行虚拟化

时间:2016-08-01 14:38:30

标签: python pandas graphviz

我有数据框

ID  domain  search_term
111 vk.com  вконтакте
111 twitter.com фэйсбук
111 facebook.com    твиттер
222 avito.ru    купить машину
222 vk.com  вконтакте
333 twitter.com твиттер
333 apple.com   купить айфон
333 rbk.ru  новости

我需要打印3个图形。 我用

domains = df['domain'].values.tolist()
search_terms = df['search_term'].values.tolist()
ids = df['ID'].values.tolist()
for i, (id, domain, search_term) in enumerate(zip(ids, domains, search_terms)):
    if ids[i] == ids[i - 1]:
        f = Digraph('finite_state_machine', filename='fsm.gv', encoding='utf-8')
        f.body.extend(['rankdir=LR', 'size="5,5"'])
        f.attr('node', shape='circle')
        f.edge(domains[i - 1], domains[i], label=search_terms[i])
    else:
        continue
f.view()

但它只打印图表以持续字符串,我得到 only one file with graph 我怎样才能获得3个图表?

1 个答案:

答案 0 :(得分:0)

您在每次迭代时创建一个新图形。将创建带出循环,只需添加边缘内部:

f = Digraph('finite_state_machine', filename='fsm.gv', encoding='utf-8')
f.body.extend(['rankdir=LR', 'size="5,5"'])
f.attr('node', shape='circle')
for i, (id, domain, search_term) in enumerate(zip(ids, domains, search_terms)):
    if ids[i] == ids[i - 1]:
        f.edge(domains[i - 1], domains[i], label=search_terms[i])
f.view()

如果您希望每次迭代都生成一个新图形,请使用:

for i, (id, domain, search_term) in enumerate(zip(ids, domains, search_terms)):
    if ids[i] == ids[i - 1]:
        f = Digraph('finite_state_machine', filename='fsm.gv', encoding='utf-8')
        f.body.extend(['rankdir=LR', 'size="5,5"'])
        f.attr('node', shape='circle')
        f.edge(domains[i - 1], domains[i], label=search_terms[i])
        f.render(filename=str(id))

顺便说一句,我删除了else: continue,因为它是多余的。