Python - 将文件夹树转换为有组织的字典

时间:2017-03-23 16:48:11

标签: python dictionary

我有一个代表文件夹树的字典:

folders = [{
    "NAME": " Folder 1",
    "ID": "869276"
}, {
    "ID": "869277",
    "NAME": "- Sub-folder 1"
}, {
    "ID": "869279",
    "NAME": "-- Sub-sub-folder 1"
}, {
    "NAME": "--- Sub-sub-folder 1 2",
    "ID": "869285"
}, {
    "NAME": "--- Sub-sub-folder 1 3",
    "ID": "869286"
}, {
    "NAME": "-- Sub-sub-folder 2",
    "ID": "869280"
}, {
    "ID": "869281",
    "NAME": " Folder 2"
}, {
    "ID": "869282",
    "NAME": "- Sub-folder 2"
}, {
    "NAME": "- Sub-folder 2 1",
    "ID": "869283"
}, {
    "NAME": "-- Sub-Sub-folder 2 1",
    "ID": "869284"
}]

更明确的表示:

 Folder 1
- Sub-folder 1
-- Sub-sub-folder 1
--- Sub-sub-folder 1 2
--- Sub-sub-folder 1 3
-- Sub-sub-folder 2
 Folder 2
- Sub-folder 2
- Sub-folder 2 1
-- Sub-Sub-folder 2 1

我需要将这个字典组织成一个新的字典,其中每个文件夹都有父文件夹的值,比如

 [{
    "NAME": " Folder 1",
    "ID": "869276",
    "PARENT": "0"
}, {
    "ID": "869277",
    "NAME": "- Sub-folder 1",
    "PARENT": "869276"
}, 
...
]

所以我想的是计算' - '在文件夹名称之前跟踪文件夹深度:

for folder in folders:
    # Folders in root have a whitespace before the name
    depth = folder['NAME'].split(' ')[0].count('-')
    if depth == 0:
        parent = '0'
    else:
        #for each previous_folder:
            previous_depth = previous_folder['NAME'].split(' ')[0].count('-')
            if previous_depth < depth:
                 parent = prvious_folder['ID']
            else:
                 #keep looking...

问题是用实际工作代码填充注释行。如何从当前文件夹开始与列表中的每个先前文件夹进行交互?我如何继续循环?

2 个答案:

答案 0 :(得分:1)

我认为诀窍是跟踪当前父母,祖父母等的列表中的祖先。您可以将它们从列表中删除以返回到基因库中。我有一些你可以删除的调试打印但它帮助我看看算法是如何进展的。我创建了一个名为“”的虚拟根来处理顶级文件夹。您可以将其重命名为任何内容,如果您不希望它显示,也可以重命名。

+--------------------+--------------+--------------------+-------------+--------------+---------------------+-------------------+-----------------------+-------------+--------------------+
|          created_at|   screen_name|                text|retweet_count|favorite_count|in_reply_to_status_id|in_reply_to_user_id|in_reply_to_screen_name|user_mentions|            hashtags|
+--------------------+--------------+--------------------+-------------+--------------+---------------------+-------------------+-----------------------+-------------+--------------------+
|2017-03-13 23:00:...|  danielmellen|#DevOps understan...|            0|             0|                 null|               null|                   null|           []|            [devops]|
|2017-03-13 23:00:...|     RebacaInc|Automation of ent...|            0|             0|                 null|               null|                   null|           []|[googlecloud, orc...|
|2017-03-13 23:00:...| CMMIAppraiser|Get your Professi...|            0|             0|                 null|               null|                   null|           []|        [broadsword]|
|2017-03-13 23:00:...|       usxtron|and when the syst...|            0|             0|                 null|               null|                   null|           []|             [cloud]|
|2017-03-13 23:00:...|     SearchCRM|.#Automation and ...|            0|             0|                 null|               null|                   null|           []|[automation, chat...|
|2017-03-13 23:00:...|  careers_tech|SummitSync - Juni...|            0|             0|                 null|               null|                   null|           []|[junior, cloud, e...|
|2017-03-13 23:00:...|    roy_lauzon|Both the #DevOps ...|            0|             0|                 null|               null|                   null|           []|[devops, cybersec...|
|2017-03-13 23:00:...|      nosqlgal|Introducing #Couc...|            0|             0|                 null|               null|                   null|           []|  [couchbase, nosql]|
|2017-03-13 23:00:...|  jordanfarrer|Ran into a weird ...|            0|             0|                 null|               null|                   null|           []|            [docker]|
|2017-03-13 23:00:...|    BGrieveSTL|#purestorage + #a...|            0|             0|                 null|               null|                   null|           []|[purestorage, azure]|
|2017-03-13 23:00:...| Hotelbeds_API|"How to Quickly O...|            0|             0|                 null|               null|                   null|           []|       [api, feedly]|
|2017-03-13 23:00:...|  ScalaWilliam|Principles behind...|            0|             0|                 null|               null|                   null|           []|             [agile]|
|2017-03-13 23:00:...|   PRFT_Oracle|[On-Demand Webina...|            0|             0|                 null|               null|                   null|           []|             [cloud]|
|2017-03-13 23:00:...|    PDF_filler|Now you can #secu...|            0|             0|                 null|               null|                   null|           []|[secure, data, ap...|
|2017-03-13 23:00:...|lgoncalves1979|10 Mistakes We Ma...|            0|             0|                 null|               null|                   null|           []|[coaching, scrumm...|
|2017-03-13 23:00:...|       Jelecos|Vanguard CIO: Why...|            0|             0|                 null|               null|                   null|           []|[microservices, cio]|
|2017-03-13 23:00:...|   DJGaryBaldy|Why bother with W...|            0|             0|                 null|               null|                   null|           []|        [automation]|
|2017-03-13 23:00:...|     1codeblog|Apigee Edge Produ...|            0|             0|                 null|               null|                   null|           []|[cloud, next17, g...|
|2017-03-13 23:00:...|     CloudRank|Why and when shou...|            0|             0|                 null|               null|                   null|           []|[machinelearning,...|
|2017-03-13 23:00:...|  forgeaheadio|5 essentials for ...|            0|             0|                 null|               null|                   null|           []|[hybrid, cloud, h...|
+--------------------+--------------+--------------------+-------------+--------------+---------------------+-------------------+-----------------------+-------------+--------------------+
only showing top 20 rows

输出:

folders = [{
    "NAME": " Folder 1",
    "ID": "869276"
}, {
    "ID": "869277",
    "NAME": "- Sub-folder 1"
}, {
    "ID": "869279",
    "NAME": "-- Sub-sub-folder 1"
}, {
    "NAME": "--- Sub-sub-folder 1 2",
    "ID": "869285"
}, {
    "NAME": "--- Sub-sub-folder 1 3",
    "ID": "869286"
}, {
    "NAME": "-- Sub-sub-folder 2",
    "ID": "869280"
}, {
    "ID": "869281",
    "NAME": " Folder 2"
}, {
    "ID": "869282",
    "NAME": "- Sub-folder 2"
}, {
    "NAME": "- Sub-folder 2 1",
    "ID": "869283"
}, {
    "NAME": "-- Sub-Sub-folder 2 1",
    "ID": "869284"
}]

# id to folder index (with virtual root) for printing
folders_by_id = {folder['ID']:folder for folder in folders}
folders_by_id['<root>'] = {'NAME':'<root>', 'ID':-1}

# current ancestors stack
parents = ['<root>']

for folder in folders:
    depth = folder['NAME'].split(' ')[0].count('-') + 1 # w/ virtual root
    print('state', 'parents', [folders_by_id[_id] for _id in parents], 'name', folder['NAME'], 'depth', depth)
    while depth < len(parents):
        old = parents.pop()
        print('removing', old)
    folder['PARENT'] = parents[-1]
    parents.append(folder['ID'])

print()
print('++++++++++++++++++++++++++++++ showing parents +++++++++++++++++++++++++++++++')
for folder in folders:
    parent = folders_by_id[folder['PARENT']]
    print('{padding}{parent} ({p_id}) --> {child} ({c_id})'.format(
        padding='  ' * parent['NAME'].count('-'), parent=parent['NAME'], 
        p_id= parent['ID'], child=folder['NAME'], c_id=folder['ID']))

答案 1 :(得分:0)

如果您将每个深度级别的父ID存储在一个dict中,只需几行代码就可以完成:

new_a = list()
names_list = list()
for el in a:
    if el['name'] not in names_list:
        new_a.append({'name':el['name'],'id':el['id']})
        names_list.append(el['name'])
    else:
        for new_el in new_a:
            if el['name'] == new_el['name']:
                new_el['id'] += el['id']