我有一个包含50列的扁平csv文件(让我们称之为FirstName,LastName,Address等),这是用制表符分隔的所有字段的引号。
我需要将其转换为JSON文件,但有些棘手的是,某些CSV列需要转换为嵌套字段,除了列的行值之外,嵌套字段还包含某些通用字段和值(这适用于具有必填字段的API)。我事先知道哪些列需要成为嵌套字段。
因此,为简单起见,我们假设这是CSV文件中前3列的第一行结构:
FirstName LastName Address
John Doe 21 Python Street
这是所需的JSON输出:
{
"FirstName": "John",
"LastName": "Doe",
"Shipping Details": [
{
"Generic Field 1": "Generic Value 1",
"Generic Field 2": "Generic Value 2",
"Address": "21 Python Street"
}
]
}
在包含50列的完整CSV中,我还需要更多列来解析嵌套字段,并使用这些额外的通用值。
我该怎么做呢?
答案 0 :(得分:3)
使用DictReader
并通过添加Shipping Details
并删除Address
来操纵该行。
j = []
with open("/tmp/so.csv") as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
# Add 'Shipping Details' to row.
# Note that something like this will have to be done
# for *every* column you want to change.
row["Shipping Details"] = {
"Generic Field 1": "Generic Value 1",
"Generic Field 2": "Generic Value 2",
"Address": row["Address"]}
# We don't need the 'Address' anymore.
del(row["Address"])
# Collect the changed row in the list of rows.
j.append(row)
print(json.dumps(j))
输出(在lint之后):
[{
"LastName": "Doe",
"Shipping Details": {
"Address": "21 Python Street",
"Generic Field 1": "Generic Value 1",
"Generic Field 2": "Generic Value 2"
},
"FirstName": "John"
}]
答案 1 :(得分:1)
继续@FullName回答,也许你可以有一个创建新密钥的函数:
def nested_key(row,key_to_swap, pre_filled_dict):
pre_filled_dict[key_to_swap]=row[key_to_swap]
row[key_to_swap]=pre_filled_dict[key_to_swap]
然后你只需要创建pre_filled_dict即:
pre_filled_adsress={
"Generic Field 1": "Generic Value 1",
"Generic Field 2": "Generic Value 2"}
并在for循环中:
for row in reader:
nested_key(row,"Address",pre_filled_address)
nested_key(row,"2nd_nested_key",second_dict)
我不确定这是否需要,我不知道你有多少这样的价值。
答案 2 :(得分:1)
您可以创建一个字典来定义具有嵌套dicts的列,并使用这些列来填充该列的值。将您的自定义保存到单个整合位置可以更容易地读取/维护,并且更容易移植到其他csv格式。
import copy
CSV_CONFIG = {
2: {
# Column 3 (zero-based index 2)
"name": "Shipping Details",
"Generic Field 1": "Generic Value 1",
"Generic Field 2": "Generic Value 2",
},
3: {
# Column 4 (zero-based index 3)
"name": "Personage",
"Generic Field 3": "Generic Value 3",
"Generic Field 4": "Generic Value 4",
},
}
现在,您根据data
CSV_CONFIG
data = []
with open(file, "r") as fh:
col_names = fh.readline().strip().split(",")
for line in fh.readlines():
line_data = {}
cols = line.strip().split(",")
for i in range(len(cols)):
if i not in CSV_CONFIG:
#this is not a nested column
line_data[col_names[i]] = cols[i]
else:
#this column is nested
nested_dict = copy.deepcopy(CSV_CONFIG[i])
nested_dict[col_names[i]] = cols[i]
del nested_dict["name"]
line_data[CSV_CONFIG[i]["name"]] = nested_dict
data.append(line_data)
鉴于您的数据添加了"personage"
列以显示多个嵌套列,data
现在是
[{
'FirstName': 'John',
'LastName': 'Doe',
'Personage': {
'Generic Field 3': 'Generic Value 3',
'Generic Field 4': 'Generic Value 4',
'Vitality': 'Alive'
},
'Shipping Details': {
'Address': '21 Pytohn Street',
'Generic Field 1': 'Generic Value 1',
'Generic Field 2': 'Generic Value 2'
}
}, {
'FirstName': 'Elvis',
'LastName': 'Presley',
'Personage': {
'Generic Field 3': 'Generic Value 3',
'Generic Field 4': 'Generic Value 4',
'Vitality': 'Deceased'
},
'Shipping Details': {
'Address': 'Elvis Presley Blvd',
'Generic Field 1': 'Generic Value 1',
'Generic Field 2': 'Generic Value 2'
}
}]