蟒蛇。如何从JSON获取目录路径?

时间:2014-08-14 22:49:09

标签: python json parsing

我正在从json的API中找到答案:

    "files":[
    {
      "name":"main",
      "node_type":"directory",
      "files":[
        {
          "name":"source1",
          "node_type":"directory",
          "files":[
            {
              "name":"letters",
              "node_type":"directory",
              "files":[
                {
                  "name":"messages.po",
                  "node_type":"file",
                  "created":"2014-08-14 08:51:41",
                  "last_updated":"2014-08-14 08:51:42",
                  "last_accessed":"0000-00-00 00:00:00"
                }
              ]
            }
          ]
        },
        {
          "name":"source2",
          "node_type":"directory",
          "files":[

          ]
        }
      ]
    },
    {
      "name":"New Directory",
      "node_type":"directory",
      "files":[
        {
          "name":"prefs.js",
          "node_type":"file",
          "created":"2014-08-14 08:11:53",
          "last_updated":"2014-08-14 08:11:53",
          "last_accessed":"0000-00-00 00:00:00"
        }
      ]
    },
    {
      "name":"111",
      "node_type":"directory",
      "files":[
        {
          "name":"222",
          "node_type":"directory",
          "files":[
            {
              "name":"333",
              "node_type":"directory",
              "files":[
                {
                  "name":"cli.mo",
                  "node_type":"file",
                  "created":"2014-08-14 08:51:30",
                  "last_updated":"2014-08-14 08:51:30",
                  "last_accessed":"0000-00-00 00:00:00"
                }
              ]
            }
          ]
        }
      ]
    }
  ],

项目结构是:

├──111──222──333───cli.mo
├──main──source1──letters───messages.po
         └──source2
├──New Directory──prefs.js

如何解析json,所以我可以回复这样的事情:

/111/222/333/cli.mo
/main/source1/letters/messages.po
/main/source2/
/New Directory/prefs.js

我试着在Python中写下一些代码,但我是初学者,我的尝试失败了。

3 个答案:

答案 0 :(得分:3)

如果您正在寻找实际收到的字符串,我建议使用生成器:

def parse(data, parent=''):
    if data is None or not len(data):
        yield parent
    else:
        for node in data:
            for result in parse(
                    node.get('files'), parent + '/' + node.get('name')):
                yield result

你也可以在yield parent语句中使用一个变体来返回带有斜杠(/main/source2)的/main/source2/,尽管我觉得它太冗长了:

        yield parent + ('/' if data is not None and not len(data) else '')

将您的JSON解析列表传递给上面的parse函数,然后您将收到一个迭代器,它将为您提供它在数据中找到的字符串:

import json

# shamelessly ignoring PEP8 for the sake of space
data = '''
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42",
"name": "messages.po", "created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory",
"name": "source1"}, {"files": [], "node_type": "directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files":
[{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created":
"2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [{"files": [{"node_type": "file",
"last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}],
"node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
'''

for item in parse(json.loads(data)):
    print item

运行上面的内容会给你

/main/source1/letters/messages.po
/main/source2
/New Directory/prefs.js
/111/222/333/cli.mo

作为输出。这里有关于生成器的非常有趣的读物:What does the "yield" keyword do in Python? - 我建议仔细阅读所有答案。

答案 1 :(得分:1)

您需要的是递归下降解析器。 json模块可以解决大量解析JSON语法的问题,但仍需要遍历生成的数据结构并对其进行解释。调用递归是因为您不知道将遇到多少层或级别的目录结构。

jdata = """
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42", "name": "messages.po",
"created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory", "name": "source1"}, {"files": [], "node_type":
"directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created": "2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [
{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}], "node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
"""

import json
import os
import sys

if sys.version_info[0] > 2:
    unicode = str

class Filepaths(object):

    def __init__(self, data):
        """
        Discover file paths in the given data. If the data is JSON string,
        decode it. If already decoded into Python structures, use it directly.
        """
        self.paths = []
        if isinstance(data, (str, unicode)):
            data = json.loads(data)
        self.traverse(data)
        self.paths = reversed(self.paths)

    def traverse(self, n, prefix="/"):
        """
        Traverse the data tree. On terminal nodes, add files and directories
        found to self.paths
        """
        if isinstance(n, list):
            for item in n:
                self.traverse(item, prefix)
        elif isinstance(n, dict):
            nodetype = n['node_type']
            nodename = n['name']
            if nodetype == 'directory':
                files = n['files']
                if files:
                    for f in files:
                        self.traverse(f, os.path.join(prefix, nodename))
                else:
                    self.paths.append(os.path.join(prefix, nodename) + '/')
            elif nodetype == 'file':
                self.paths.append(os.path.join(prefix, nodename))
            else:
                raise ValueError("didn't understand node named {0!r}, type {1!r}".format(nodename, nodetype))
        else:
            raise ValueError("didn't understand node {0!r}".format(n))

p = Filepaths(jdata)
for path in p.paths:
    print path

这导致:

/111/222/333/cli.mo
/New Directory/prefs.js
/main/source2/
/main/source1/letters/messages.po

请注意,我使用了一个类而不仅仅是一个递归函数来绕过Python对全局变量的繁琐规则。当然,我可以声明一个全局变量paths并在函数中将其标记为global,但这很麻烦。对象是将“打包”例程和它们需要访问的数据“标准化”的标准Python方式。递归遍历通常更适合作为Python中的对象。

答案 2 :(得分:0)

我认为解决这个问题的最佳方式与Unix中的ls -R和Python中的os.walk()相同:递归。例如,要列出包括目录在内的所有文件,您可以执行以下操作:

def walk(tree, path):
  dirs = []
  for f in tree:
    print(path + '/' + f['name'])
    if f['node_type']=='directory':
      dirs.append(f['files'])

  for subtree in dirs:
    walk(subtree, path+'/'+f['name'])