Question

我有三个要创建到字典中的python列表，然后根据键值将这三个字典合并为一个。

我的python列表是这样制作的：

with open ('full_product_shipments.xml','r') as file2:
    full_product_shipments = list([line.strip().replace('{"','').replace('}','').replace('"','').replace(':',',').split(',') for line in file2])

它们看起来像这样：

列表1

[['transaction_id', '224847627', 'product_amount', '2.73', 'user_invoice_date', '2018-12-21'],
['transaction_id', '67919397', 'product_amount', '2.73', 'user_invoice_date', '2017-10-26']]

列表2

[['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '172238850', 'shipping_label_created', '2018-09-25 18', '40', '52'],
['tracking_code', '22105784', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '111423825', 'shipping_label_created', '2018-04-13 11', '22', '44']]

列表3

[['tracking_code', '21703238', 'from_country', 'FR', 'to_country', 'FR', 'amount', '3.23'],
['tracking_code', '41545695', 'from_country', 'FR', 'to_country', 'FR', 'amount', '2.9']]

list1 和 list2 都具有transaction_id，一旦将它们转换为字典，我就需要加入它们。

新加入的列表（ list1 和 list2 ）和 list3 都具有tracking_id，我想通过它们一次加入它们 list3 转换为字典。

我已经尝试过使用它：

result=[x.update(amount=y['amount']) for x in full_product_shipments for y in full_provider_invoices if x['transaction_id'] == y['transaction_id']]

但这会引发TypeError

TypeError: list indices must be integers or slices, not str

也许没有必要将所有内容都转换为dict。我是python的新手，因此，如果有更好的方法来基于键合并信息，我将不胜感激。

Answer 1

示例数据中的示例似乎没有匹配的条目，假设您的完整数据集将与您可以执行的所有操作都匹配。

l1 = [['transaction_id', '224847627', 'product_amount', '2.73', 'user_invoice_date', '2018-12-21'], ['transaction_id', '67919397', 'product_amount', '2.73', 'user_invoice_date', '2017-10-26']]
l2 = [['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '172238850', 'shipping_label_created', '2018-09-25 18', '40', '52'], ['tracking_code', '22105784', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '111423825', 'shipping_label_created', '2018-04-13 11', '22', '44']]
l3 = [['tracking_code', '21703238', 'from_country', 'FR', 'to_country', 'FR', 'amount', '3.23'], ['tracking_code', '41545695', 'from_country', 'FR', 'to_country', 'FR', 'amount', '2.9']]

# Convert everything to dict
result = {y['transaction_id']:y for y in [dict(zip(x[::2], x[1::2])) for x in l1]}
d2 = {y['transaction_id']:y for y in [dict(zip(x[::2], x[1::2])) for x in l2]}
d3 = {y['tracking_code']:y for y in [dict(zip(x[::2], x[1::2])) for x in l3]}

# Update result dict with data from the other lists.
for entry in result.values():
    entry.update(d2[entry['transaction_id']])
    entry.update(d3[entry['tracking_code']])

Answer 2

如果原始原始数据采用“ json”格式而不是“ xml”格式，则连接起来会更容易。如果要使用REST API下载数据，请尝试在末尾'＆$ format = json'传入关键字，然后查看文件结果是否以json字符串形式返回。例如，这将在SAP REST API中起作用，但是我认为它是许多API提供程序中的标准参数。

为了分享我在工作中的经验，我获得了一个SAP API，默认响应为XML…我试图使用Python XML解析库（无休止地打扰我）来理解它，直到我意识到自己可以通过原始URL字符串的参数，它将以JSON的形式返回。根据我的经验，这是我对您的问题的建议。

以下是带有语法的公共API的示例……尝试为您的API尝试类似的组合。

https://vpic.nhtsa.dot.gov/api/

https://vpic.nhtsa.dot.gov/api/Home/Index/LanguageExamples

现在，如果您可以下载JSON字符串，则将其转换为Python字典非常容易...在线上有很多资源如何做到这一点。然后将Python字典转换为pandas数据框非常简单，在线上有很多资源如何做到这一点。然后将多个数据框连接在一起非常简单，在线资源很多，怎么做到这一点。

如果无法获取JSON字符串，则在线上有一些（更复杂的）资源，介绍了如何从XML转换为JSON。以下是一些链接：

How to convert an xml string to a dictionary?

https://ericscrivner.me/2015/07/python-tip-convert-xml-tree-to-a-dictionary/

http://code.activestate.com/recipes/573463-converting-xml-to-dictionary-and-back/

您会发现使用字典而不是列表要容易得多。列表用于存储订购的商品，但是列表中存储了一堆键值对（这正是字典的用处）。

希望有帮助！

Answer 3

尽管它是一个xml文件名，但您的来源似乎是JSON（如另一个响应中所述），从JSON生成字典可能会更容易。

假设这是不可能的，下面的程序将遍历您的不同列表，尝试获取一个交易ID，它将用作我们主要defaultdict的键，如果id不存在，则将填充一个空dict不存在或在其字典中添加新条目。

这是完整的代码。请注意，我已经修改了第二个列表，使其ID与第一个列表匹配，以显示如何将单独列表中的字段合并到同一字典中。假定字段之间没有重叠。

from collections import defaultdict

list1 = [['transaction_id', '224847627', 'product_amount', '2.73', 'user_invoice_date', '2018-12-21'],
['transaction_id', '67919397', 'product_amount', '2.73', 'user_invoice_date', '2017-10-26']]

# list2 = [['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '172238850', 'shipping_label_created', '2018-09-25 18', '40', '52'],
list2 = [['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '224847627', 'shipping_label_created', '2018-09-25 18', '40', '52'],
['tracking_code', '22105784', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '111423825', 'shipping_label_created', '2018-04-13 11', '22', '44']]

list3 = [['tracking_code', '21703238', 'from_country', 'FR', 'to_country', 'FR', 'amount', '3.23'],
['tracking_code', '41545695', 'from_country', 'FR', 'to_country', 'FR', 'amount', '2.9']]




def aggregate_lists(*lists):
    transactions = defaultdict(dict)

    for list in lists:
        for row in list:
            try:
                id_col = row.index('transaction_id')
                transaction_id = row[id_col + 1]
            except ValueError:
                continue # Better error handling to be added.

            for col in range(0, len(row), 2):
                if col != id_col:
                    transactions[transaction_id][row[col]] = row[col + 1]

    return transactions

def main():
    transactions = aggregate_lists(list1, list2, list3)
    for k, props in transactions.items():
        print(f'Transaction: {k}')
        for k, v in props.items():
            print(f'\t{k}: {v}')

if __name__ == '__main__':
    main()

以下是给定的输出：

Transaction: 224847627
    product_amount: 2.73
    user_invoice_date: 2018-12-21
    tracking_code: 29285908
    from_country: FR
    to_country: FR
    package_type_id: 10
    shipping_label_created: 2018-09-25 18
    40: 52
Transaction: 67919397
    product_amount: 2.73
    user_invoice_date: 2017-10-26
Transaction: 111423825
    tracking_code: 22105784
    from_country: FR
    to_country: FR
    package_type_id: 10
    shipping_label_created: 2018-04-13 11
    22: 44

我刚刚意识到list3没有交易ID，因此将其忽略。无论如何，这应该可以给出想法。

将列表转换为字典，然后通过键值将多个字典合并为一个

3 个答案: