我收到许多包含不同产品订单的CSV文件。这些CSV文件需要“转换”为特定的JSON结构。
CSV文件的每一行代表一种产品的顺序。这意味着如果我要订购两个产品,则CSV将包含两行。
CSV文件的简化版本可能如下所示(请注意第一行和第三行中的orderId
“ 111”):
orderId,itemNumber,itemName,name,street
111,123,testitem,john doe,samplestreet 1
222,345,anothertestitem,jane doe,samplestreet 1
111,345,anothertestitem,john doe,samplestreet 1
我当前的解决方案有效,但我认为事情太复杂了。
目前,我正在遍历每个CSV行并创建JSON结构,在其中我使用了一个辅助函数,该函数将添加订单或追加一个包含有序项目的列表,如下所示:
def add_orderitem(orderitem, order, all_orders):
""" Adds an ordered product to the order or "create" a new order if it doesn't exist """
for row in all_orders:
# Order already exists
if any(order["orderNumber"] == value for field, value in row.items()):
print(f"Order '{order['orderNumber']}' already exists, adding product #{orderitem['sku']}")
row["orderItems"].append(orderitem)
return all_orders
# New order
print(f"New Order found, creating order '{order['orderNumber']}' and adding product #{orderitem['sku']}")
all_orders.append(order)
order["orderItems"].append(orderitem)
return all_orders
def parse_orders():
""" Converts CSV-orders into JSON """
results = []
orders = read_csv("testorder.csv") # helper-function returns CSV-dictreader (list of dicts)
for order in orders:
# Create basic structure
orderdata = {
"orderNumber": order["orderId"],
"address": {
"name": order["orderId"],
"street": order["street"]
},
"orderItems": [] # <-- this will be filled later
}
# Extract product-information that will be inserted in above 'orderItems' list
product = {
"sku": order["itemNumber"],
"name": order["itemName"]
}
# Add order to final list or add item if order already exists
results = add_orderitem(product, orderdata, results)
return results
def main():
from pprint import pprint
parsed_orders = parse_orders()
pprint(parsed_orders)
if __name__ == "__main__":
main()
skript可以正常工作,下面的输出是我所期望的:
New Order found, creating order '111' and adding product #123
New Order found, creating order '222' and adding product #345
Order '111' already exists, adding product #345
[{'address': {'name': '111', 'street': 'samplestreet 1'},
'orderItems': [{'name': 'testitem', 'sku': '123'},
{'name': 'anothertestitem', 'sku': '345'}],
'orderNumber': '111'},
{'address': {'name': '222', 'street': 'samplestreet 1'},
'orderItems': [{'name': 'anothertestitem', 'sku': '345'}],
'orderNumber': '222'}]
有没有办法做到这一点“更智能”?
答案 0 :(得分:1)
输入namedtuple
和groupby
可以使您的代码更清晰:
from collections import namedtuple
from itertools import groupby
# csv data or file
data = """orderId,itemNumber,itemName,name,street
111,123,testitem,john doe,samplestreet 1
222,345,anothertestitem,jane doe,samplestreet 1
111,345,anothertestitem,john doe,samplestreet 1
"""
# the Order tuple
Order = namedtuple('Order', 'orderId itemNumber itemName name street')
# load the csv into orders
orders = [Order(*values) for line in data.split("\n")[1:] if line for values in [line.split(",")]]
# and group it by orderId
orders = sorted(orders, key = lambda order: order.orderId)
# group it by orderId
output = list()
for key, values in groupby(orders, key=lambda order: order.orderId):
items = list(values)
dct = {"address": {"name": items[0].name, "street": items[0].street},
"orderItems": [{"name": item.itemName, "sku": item.itemNumber} for item in items]}
output.append(dct)
print(output)
这产生
[{'address': {'name': 'john doe', 'street': 'samplestreet 1'}, 'orderItems': [{'name': 'testitem', 'sku': '123'}, {'name': 'anothertestitem', 'sku': '345'}]},
{'address': {'name': 'jane doe', 'street': 'samplestreet 1'}, 'orderItems': [{'name': 'anothertestitem', 'sku': '345'}]}]
您甚至可以很好地理解它,但这不会使其更具可读性。