优化文件读取

时间:2021-06-02 18:38:04

标签: python data-science

我有 2 个文件,对于一个文件的每一行,我必须搜索另一个文件以找到相应的信息。第一个文件(类别 1)包含以行分隔的 json 对象,其中每个对象依次包含评论者、商品 ASIN、评分和时间戳(如下所示):

{"overall": 5.0, "verified": true, "reviewTime": "01 5, 2016", "reviewerID": "A2V0JXLJ9VCNNX", "asin": "B00570RQ0A", "reviewerName": "Amazon Customer", "reviewText": "washer washing", "summary": "Five Stars", "unixReviewTime": 1451952000}

另一个文件是元数据,它也包含行分隔的 json 对象,其中每个对象包含产品描述、图像链接和其他产品信息(如下所示):

{"category": ["Appliances", "Parts & Accessories", "Refrigerator Parts & Accessories"], "description": ["Little Giant ACS-2 Auxiliary Condensate Drain Pan Overflow Shut-off Switch, 48 VAC/VDC, 18\" Leads (599122)Auxiliary condensate switch installs in the condensate drain pan of an air conditioning or refrigeration unit, to turn off the unit if the drain pan approaches overflowLittle Giant ACS-2 Auxiliary Condensate Drain Pan Overflow Shut-off Switch, 48 VAC/VDC, 18\" Leads (599122) Features: ABS housing Polyethylene float 48VAC/VDC max 5 amps max Low voltage 18\" lead wiresLittle Giant ACS-2 Auxiliary Condensate Drain Pan Overflow Shut-off Switch, 48 VAC/VDC, 18\" Leads (599122) Specification: Cord Length: 18\" Shut Off: 0 Voltage: 48 VAC/DC Amps: 5 Weight: 0.21 Height: 3 Width: 1.34 Length: 4.55.", "Little Giant ACS-2 Auxiliary Condensate Drain Pan Overflow Shut-off Switch, 48 VAC/VDC, 18\" Leads (599122)Auxiliary condensate switch installs in the condensate drain pan of an air conditioning or refrigeration unit, to turn off the unit if the drain pan approaches overflowLittle Giant ACS-2 Auxiliary Condensate Drain Pan Overflow Shut-off Switch, 48 VAC/VDC, 18\" Leads (599122) Features: ABS housing Polyethylene float 48VAC/VDC max 5 amps max Low voltage 18\" lead wiresLittle Giant ACS-2 Auxiliary Condensate Drain Pan Overflow Shut-off Switch, 48 VAC/VDC, 18\" Leads (599122) Specification: Cord Length: 18\" Shut Off: 0 Voltage: 48 VAC/DC Amps: 5 Weight: 0.21 Height: 3 Width: 1.34 Length: 4.55"], "fit": "", "title": "Little Giant 599122 ACS-2 Float Switch with 18-Inch Lead, 1-Pack", "also_buy": [], "image": ["https://images-na.ssl-images-amazon.com/images/I/413p5bagSJL._SS40_.jpg"], "tech2": "", "brand": "Little Giant", "feature": ["Little Giant", "ABS housing, polyethylene float", "72\" lead wires"], "rank": [">#128,783 in Tools & Home Improvement (See top 100)", ">#2,384 in Tools & Home Improvement > Appliances > Large Appliance Accessories > Refrigerator Parts & Accessories"], "also_view": ["B000JGH2TM", "B0026WSD4A", "B000SM342Q", "B003QK4KUM", "B005D4RFEM", "B00DK85P9A", "B000FK9W0E", "B013K33QQI", "B079NQ1532", "B004496WNW", "B01N19NQLN", "B000AHT78O"], "details": {}, "main_cat": "Tools & Home Improvement", "similar_item": "", "date": "October 4, 2007", "price": "$19.65", "asin": "B000WQZFFW"}

基本上,对于第一个文件每一行上的每个项目,我正在搜索元数据以检索与该项目相关的价格信息和产品描述。 目前,我正在使用双 for 循环来实现这一点,如下所示。有什么办法可以更好地优化我的代码?

def get_ecoList(category1, metaCat):
    global meta
    meta = read_metadata(metaCat)
    with open(category1, 'r+') as y:
        data = y.readlines()
        tempArr = []
        idx = 0
        for line in data:
            metaFlag = 0
            currLine = line.split(',')
            i = currLine[0]
            if i in tempArr:
                continue
            tempArr.append(i)
            items[i] = 0
            global prices
            prices[idx] = '$1.0' #change to prices[idx] = avg_prices[i]
            for k in meta:
                if 'asin' in k and k['asin'] == i:
                    metaFlag = 1
                    if 'price' in k:
                        if len(k['price']) > 0:
                            prices[idx] = k['price']
                    if 'description' in k:
                        if len(k['description']) > 1:
                            k['description'] = ''.join(k['description'])
                            ecoList.append(k['description'])
                        elif len(k['description']) == 0:
                            ecoList.append('N/A')
                        else:
                            ecoList.append(k['description'][0])
                    else:
                        ecoList.append('N/A')
                    break
            idx += 1
            if metaFlag == 0:
                ecoList.append('N/A')
    prices = [float(re.findall("\d+\.\d+",x)[0]) for x in prices if x != 0]
    y.close()

0 个答案:

没有答案