我正在从json的API中找到答案:
"files":[
{
"name":"main",
"node_type":"directory",
"files":[
{
"name":"source1",
"node_type":"directory",
"files":[
{
"name":"letters",
"node_type":"directory",
"files":[
{
"name":"messages.po",
"node_type":"file",
"created":"2014-08-14 08:51:41",
"last_updated":"2014-08-14 08:51:42",
"last_accessed":"0000-00-00 00:00:00"
}
]
}
]
},
{
"name":"source2",
"node_type":"directory",
"files":[
]
}
]
},
{
"name":"New Directory",
"node_type":"directory",
"files":[
{
"name":"prefs.js",
"node_type":"file",
"created":"2014-08-14 08:11:53",
"last_updated":"2014-08-14 08:11:53",
"last_accessed":"0000-00-00 00:00:00"
}
]
},
{
"name":"111",
"node_type":"directory",
"files":[
{
"name":"222",
"node_type":"directory",
"files":[
{
"name":"333",
"node_type":"directory",
"files":[
{
"name":"cli.mo",
"node_type":"file",
"created":"2014-08-14 08:51:30",
"last_updated":"2014-08-14 08:51:30",
"last_accessed":"0000-00-00 00:00:00"
}
]
}
]
}
]
}
],
项目结构是:
├──111──222──333───cli.mo
├──main──source1──letters───messages.po
└──source2
├──New Directory──prefs.js
如何解析json,所以我可以回复这样的事情:
/111/222/333/cli.mo
/main/source1/letters/messages.po
/main/source2/
/New Directory/prefs.js
我试着在Python中写下一些代码,但我是初学者,我的尝试失败了。
答案 0 :(得分:3)
如果您正在寻找实际收到的字符串,我建议使用生成器:
def parse(data, parent=''):
if data is None or not len(data):
yield parent
else:
for node in data:
for result in parse(
node.get('files'), parent + '/' + node.get('name')):
yield result
你也可以在yield parent
语句中使用一个变体来返回带有斜杠(/main/source2
)的/main/source2/
,尽管我觉得它太冗长了:
yield parent + ('/' if data is not None and not len(data) else '')
将您的JSON解析列表传递给上面的parse
函数,然后您将收到一个迭代器,它将为您提供它在数据中找到的字符串:
import json
# shamelessly ignoring PEP8 for the sake of space
data = '''
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42",
"name": "messages.po", "created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory",
"name": "source1"}, {"files": [], "node_type": "directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files":
[{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created":
"2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [{"files": [{"node_type": "file",
"last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}],
"node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
'''
for item in parse(json.loads(data)):
print item
运行上面的内容会给你
/main/source1/letters/messages.po
/main/source2
/New Directory/prefs.js
/111/222/333/cli.mo
作为输出。这里有关于生成器的非常有趣的读物:What does the "yield" keyword do in Python? - 我建议仔细阅读所有答案。
答案 1 :(得分:1)
您需要的是递归下降解析器。 json
模块可以解决大量解析JSON语法的问题,但仍需要遍历生成的数据结构并对其进行解释。调用递归是因为您不知道将遇到多少层或级别的目录结构。
jdata = """
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42", "name": "messages.po",
"created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory", "name": "source1"}, {"files": [], "node_type":
"directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created": "2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [
{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}], "node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
"""
import json
import os
import sys
if sys.version_info[0] > 2:
unicode = str
class Filepaths(object):
def __init__(self, data):
"""
Discover file paths in the given data. If the data is JSON string,
decode it. If already decoded into Python structures, use it directly.
"""
self.paths = []
if isinstance(data, (str, unicode)):
data = json.loads(data)
self.traverse(data)
self.paths = reversed(self.paths)
def traverse(self, n, prefix="/"):
"""
Traverse the data tree. On terminal nodes, add files and directories
found to self.paths
"""
if isinstance(n, list):
for item in n:
self.traverse(item, prefix)
elif isinstance(n, dict):
nodetype = n['node_type']
nodename = n['name']
if nodetype == 'directory':
files = n['files']
if files:
for f in files:
self.traverse(f, os.path.join(prefix, nodename))
else:
self.paths.append(os.path.join(prefix, nodename) + '/')
elif nodetype == 'file':
self.paths.append(os.path.join(prefix, nodename))
else:
raise ValueError("didn't understand node named {0!r}, type {1!r}".format(nodename, nodetype))
else:
raise ValueError("didn't understand node {0!r}".format(n))
p = Filepaths(jdata)
for path in p.paths:
print path
这导致:
/111/222/333/cli.mo
/New Directory/prefs.js
/main/source2/
/main/source1/letters/messages.po
请注意,我使用了一个类而不仅仅是一个递归函数来绕过Python对全局变量的繁琐规则。当然,我可以声明一个全局变量paths
并在函数中将其标记为global
,但这很麻烦。对象是将“打包”例程和它们需要访问的数据“标准化”的标准Python方式。递归遍历通常更适合作为Python中的对象。
答案 2 :(得分:0)
我认为解决这个问题的最佳方式与Unix中的ls -R
和Python中的os.walk()
相同:递归。例如,要列出包括目录在内的所有文件,您可以执行以下操作:
def walk(tree, path):
dirs = []
for f in tree:
print(path + '/' + f['name'])
if f['node_type']=='directory':
dirs.append(f['files'])
for subtree in dirs:
walk(subtree, path+'/'+f['name'])