我是python的新手,我正在尝试读取超过一定大小的文件夹中的所有文件,并将数据(文件路径和大小)导出到.json
到目前为止我所拥有的:
import os
import json
import sys
import io
testPath = str(sys.argv[1])
testSize = int(sys.argv[2])
try:
to_unicode = unicode
except NameError:
to_unicode = str
filesList = []
x = 1
j = "1"
data = {}
for path, subdirs, files in os.walk(testPath):
for name in files:
filesList.append(os.path.join(path, name))
for i in filesList:
fileSize = os.path.getsize(str(i))
if int(fileSize) >= int(testSize):
data['unit'] = 'B'
data['path' + j] = str(i)
data['size' + j] = str(fileSize)
x = x + 1
j = str(x)
with io.open('Files.json', 'w', encoding='utf8') as outfile:
str_ = json.dumps(data,
indent=4, sort_keys=True,
separators=(',', ': '), ensure_ascii=False)
outfile.write(to_unicode(str_))
问题是输出是:
{
"path1": "C:\\Folder\\diager.xml",
"path2": "C:\\Folder\\diag.xml",
"path3": "C:\\Folder\\setup.log",
"path4": "C:\\Folder\\ESD\\log.txt",
"size1": "1908",
"size2": "4071",
"size3": "5822",
"size4": "788",
"unit": "B"
}
但它必须是这样的:
{
"unit": "B",
"files": [{"path":"C:\Folder\file1.txt", "size": "10"}, {"path":"C:\Folder\file2.bin", "size": "400"}]
}
我添加了j变量,因为它只会替换第一个值,我最终会得到这样的结果:
{
"path": "C:\\Folder\\diager.xml",
"size": "1908",
"unit": "B"
}
我不知道如何继续......帮助?
答案 0 :(得分:2)
您可以这样做:
files = []
for i in filesList:
fileSize = os.path.getsize(str(i))
if int(fileSize) >= int(testSize):
files.append({'path': str(i), 'size': fileSize})
data['unit'] = 'B'
data['files'] = files
这样,您可以创建一个包含所有路径的列表,并在以后将其添加到data
dict。
答案 1 :(得分:0)
使用以下命令初始化数据字典:
data = {"unit": "B", "files": []}
然后您可以替换主循环:
for i in filesList:
fileSize = os.path.getsize(str(i))
if int(fileSize) >= int(testSize):
data['unit'] = 'B'
data['path' + j] = str(i)
data['size' + j] = str(fileSize)
x = x + 1
j = str(x)
通过
for i in filesList:
fileSize = os.path.getsize(str(i))
if int(fileSize) >= int(testSize):
data['files'].append({"path": str(i), "size": str(filesize)})
请注意,您不再需要x和j变量。
编辑:为了控制字段的顺序,您可以看到this question。特别是,根据this nice answer,如果您使用的是python 3.6,则可以导入OrderedDict(from collections import OrderedDict
)并将data = {"unit": "B", "files": []}
替换为data = OrderedDict(unit="B", files=[])