CSV文件中的树结构

时间:2019-02-04 05:56:50

标签: python csv tree

我有一个csv文件,想通过读取文件内容来构建树

id  | screen_name |    reply_status_id |    tweet
1   |      a      |        null        |     dahgfsjhg
2   |      b      |         1          |     fcjgvujhgjhk
3   |      c      |         2          |     ououoijoskjfpokpo
4   |      d      |         1          |     giuyhewikuhieuhi
5   |      e      |         3          |     hkjhkjlkjljlkjlj

我想基于idreply_status_idtweet创建树结构。

就像

      a [root]
     / \
    b   d  [childs]
   /
  c
 /
e

到目前为止,我的代码:

with open(file_path) as inp:
    csv_reader = csv.reader(inp)
    for row in csv_reader:
        if row[2] =='null':
            if visited == '0':
                root = Node(row[3])
                id_root = row[0]
                #inp.seek(0)
                visited = '1'
        if row[2] ==id_root:
            child = Node(row[3],root)
            child_id = row[0]

如果reply_staus_id == null,则保持screen_name为根。然后在下一行中,如果回复状态ID =任何ID,则将其保留为该ID的子代。通过重复过程为文件构建完整的树。

2 个答案:

答案 0 :(得分:1)

您可以使用anytree lib创建图:

import csv
from anytree import Node
from anytree.exporter import DotExporter

def find_subnodes(root_node, root_node_id, nodes):
    for row in lst:
        node_id = row[0]
        # name = regex.sub('', row[3])
        name = row[3].replace('\\"', '\'').replace('"', '')
        parent_node_id = row[2]
        if root_node_id == parent_node_id:
            node = Node(name, root_node)
            nodes[node_id] = node
            nodes = find_subnodes(node, node_id, nodes)
    return nodes

with open('rumour1.csv') as f:
    reader = csv.reader(f)
    next(reader)
    lst = list(reader)
r_node = Node(lst[0][3].replace('\\"', '\'').replace('"', ''))
n = {lst[0][0]: r_node}
n = find_subnodes(r_node, lst[0][0], n)
DotExporter(r_node).to_picture('tree.png')  # graphviz required

基于该CSV,您将获得:

enter image description here

答案 1 :(得分:-1)

您可以将递归与一个简单的类一起使用:

import csv
_, *data = csv.reader(open('filename.csv'))
new_data = [[a, b, c if not c.isdigit() else int(c), *d] for a, b, c, *d in data]
class Tree:
  def __init__(self, _d, _start='null'):
     self.head, _next = [i for i in _d if i[2] == _start], 1 if _start == 'null' else _start+1
     self.children = (lambda x:None if not x else Tree(_d, _next))([i for i in _d if i[2] == _next])

现在,Tree创建了一个结构,该结构按reply_status_id指定的“级别”存储推文:

d = Tree(new_data)
print(d.head)
print(d.children.head)
print(d.children.children.head)
print(d.children.children.children.head)

输出:

[['1', 'a', 'null', 'dahgfsjhg']]
[['2', 'b', 1, 'fcjgvujhgjhk'], ['4', 'd', 1, 'giuyhewikuhieuhi']]
[['3', 'c', 2, 'ououoijoskjfpokpo']]
[['5', 'e', 3, 'hkjhkjlkjljlkjlj']]