如何比较匹配值的对象列表?

时间:2017-01-19 02:09:56

标签: python list csv object

我有一个产品列表,其中包含许多具有id,image_url等属性的对象。如下所示:

total_products

[{u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQCG1ObwtCgqxZIk&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000000.png&cfs=1&_nc_hash=AQAPdo31zo9WJk8j', u'id': u'1539966686030963', u'retailer_id': u'product-1000000'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQDyc-Yyic5QLOqH&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F0.png&cfs=1&_nc_hash=AQDhmhPJxFZEpMFX', u'id': u'993388404100117', u'retailer_id': u'product-0'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQAwTzrzAjdKFjmB&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000.png&cfs=1&_nc_hash=AQCMMJRJ_r7QB06I', u'id': u'642820939176165', u'retailer_id': u'product-1000'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBHdbRqB7F6aMKM&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png&cfs=1&_nc_hash=AQDx7P52g0NYBB-3', u'id': u'1411912028843607', u'retailer_id': u'product-1'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB7aSPmk_j21umz&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F100000.png&cfs=1&_nc_hash=AQAPV5oe_ymaAcXr', u'id': u'942522339181104', u'retailer_id': u'product-100000'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB69V2cgASUIci1&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F100.png&cfs=1&_nc_hash=AQAk3eZ4vqWYbOW4', u'id': u'1347112758661660', u'retailer_id': u'product-100'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQD44rjEUMk6Yp2H&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000001.png&cfs=1&_nc_hash=AQBT_0iB417B08ux', u'id': u'1354204821311003', u'retailer_id': u'product-1000001'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB4ucqXEbo2DyC7&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000002.png&cfs=1&_nc_hash=AQAQ2vuj0WmuXSqw', u'id': u'1776841739008769', u'retailer_id': u'product-1000002'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBM75VZTNuxqaoq&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F10.png&cfs=1&_nc_hash=AQAUdkc6II5eu47D', u'id': u'1358784964179738', u'retailer_id': u'product-10'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQAY0kmVnHXBbhHe&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F10000.png&cfs=1&l&_nc_hash=AQCT1PHl5h1Rhc5r', u'id': u'1337513966312571', u'retailer_id': u'product-10000'}]

我正在阅读包含以下数据的csv文件; -

csv_file_data: enter image description here

正如您所看到的,csv_file idretailer_id中的ID对于某些产品是相同的 - 所以如果image_linkcsv file匹配,我想更改retailer_id中的id

这样做我会逐行阅读csv文件并循环浏览total_products中的所有产品,如果找到任何匹配项,则更改image_link

代码:

def update_csv(file): 
    print file
    reader  = csv.DictReader(open(file))
    out_file_name = str(file).replace(".csv", "")
    writer = csv.DictWriter(open(out_file_name+"_updated.csv","wb"),fieldnames=reader.fieldnames)
    writer.writeheader()
    for current_row in reader:
        for product in total_products:
            retailer_id = product['retailer_id']
            if(current_row['id']==retailer_id):
                current_row['image_link']= "RajSharma"
                print "Match = "+str(retailer_id)+" in "+file
                break   
        writer.writerow(current_row)

这种方法的问题是,如果total_products包含超过1000-10,000,则运行时间过长。

有没有办法在retailer_id中找到total_products,如果有,请更改image_link

1 个答案:

答案 0 :(得分:3)

首先,从total_products

创建一组ID
product_ids = set([product['retailer_id'] for product in total_products])

然后,检查current_row['id']是否在集合中:

for current_row in reader:
    if current_row['id'] in product_ids:
        current_row['image_link'] = 'RajSharma'

设置搜索速度要快得多,我们只需要一个唯一的产品ID列表进行检查。 使用if current_row['image_link'] in product_ids利用底层C代码进行循环,优化对集合中值的检查。