如何匹配新词典中两个词典和按键分组的统计信息?

时间:2017-07-14 11:05:37

标签: python python-2.7 numpy dictionary

我有两个词典列表:

this_week = [
        {
          "Stat": {
            "clicks": "1822",
            "affiliate_id": "1568",
            "advertiser_id": "1892",
            "offer_id": "2423847"
          },
          "Offer": {
            "name": "app2"
          }
        },
        {
          "Stat": {
            "clicks": "11",
            "affiliate_id": "1616",
            "advertiser_id": "2171",
            "offer_id": "2402467"
          },
          "Offer": {
            "name": "two"
          }
        }
]

last_week = [
        {
          "Stat": {
            "clicks": "1977",
            "affiliate_id": "1796",
            "advertiser_id": "1892",
            "offer_id": "2423847"
          },
          "Offer": {
            "name": "app2"
          }
        },
        {
          "Stat": {
            "clicks": "1248",
            "affiliate_id": "1781",
            "advertiser_id": "2171",
            "offer_id": "2402467"
          },
          "Offer": {
            "name": "two"
          }
        }
]

我想制作像

这样的词典
 items = {"1892" (advertiser_id):
            {'this_week':
                  {
                       {
                           "Stat": {
                           "clicks": "1822",
                           "affiliate_id": "1568",
                           "advertiser_id": "1892",
                           "offer_id": "2423847"
                       },
                           "Offer": {
                                "name": "app2"
                       } 
        },
    },
            {'last_week':
                  {
                       "Stat": {
                             "clicks": "1977",
                             "affiliate_id": "1796",
                             "advertiser_id": "1892",
                             "offer_id": "2423847"
                        },
                        "Offer": {
                              "name": "app2"
                        }
            },
            {'difference': 
                  { "clicks_difference": this_week['1892']['Stat']['clicks'] - last_week['1892']['Stat']['clicks'] }
         }

对于给定的advertiser_id,offer_id或affiliate_id,具体取决于用户的选择。这就是问题所在。两个词典中的项目顺序可能不一样,有没有其他方法可以通过advertiser_id或任何其他键对这些参数进行分组?

如果我们可以更改分组ID,如何以这种方式对这些数据进行分组?最简单的方法是什么?

1 个答案:

答案 0 :(得分:0)

您可以将一个列表展平为地图,然后循环播放与设置字段上的其他列表匹配,如果您有两个列表,最后计算点击差异,例如:

def join_on(this, last, field):
    # first turn the first list into a [field_value] => [matched_stat] map:
    result = {stat["Stat"].get(field, None): {"this_week": stat} for stat in this}
    for stat in last:  # loop through the second list
        field_value = stat["Stat"].get(field, None)  # get the value of the selected field
        if field_value in result:  # check if it was already parsed in the `this` list
            result[field_value]["last_week"] = stat  # store it as last week
            # get the click value for this week:
            clicks = int(result[field_value]["this_week"]["Stat"].get("clicks", 0))
            clicks -= int(stat["Stat"].get("clicks", 0))  # subtract the last week clicks
            # store the click difference in the `difference` key
            result[field_value]["difference"] = {"clicks_difference": clicks}
        else:  # no field found in the `this` list, store it just as last week and continue..
            result[field_value]["last_week"] = stat
    return result

然后您可以使用以下数据进行检查:

parsed_data = join_on(this_week, last_week, "advertiser_id"))

给出了:

{'1892': {'difference': {'clicks_difference': -155},
          'last_week': {'Offer': {'name': 'app2'},
                        'Stat': {'advertiser_id': '1892',
                                 'affiliate_id': '1796',
                                 'clicks': '1977',
                                 'offer_id': '2423847'}},
          'this_week': {'Offer': {'name': 'app2'},
                        'Stat': {'advertiser_id': '1892',
                                 'affiliate_id': '1568',
                                 'clicks': '1822',
                                 'offer_id': '2423847'}}},
 '2171': {'difference': {'clicks_difference': -1237},
          'last_week': {'Offer': {'name': 'two'},
                        'Stat': {'advertiser_id': '2171',
                                 'affiliate_id': '1781',
                                 'clicks': '1248',
                                 'offer_id': '2402467'}},
          'this_week': {'Offer': {'name': 'two'},
                        'Stat': {'advertiser_id': '2171',
                                 'affiliate_id': '1616',
                                 'clicks': '11',
                                 'offer_id': '2402467'}}}}