Question

这是一个漫长的昼夜。我可以使用一些帮助，为什么这不工作不能直接思考。这个脚本打开一个.csv文件检查某个单词是否包含在第[4]行的标题中，然后它将遍历包含列表的字典，如果该单词在标题中，那么它将返回字典要写入新csv文件的密钥。

我得到的内容未分类，因为标题确实包含列表中的单词，因此每行都不应该是正确的。

感谢您的帮助。

BrandsMart USA中的CSV示例，http://www.BrandsMartUSA.com,BrandsMart产品目录，01：30.8，Mini Bass King Jr蓝牙便携式户外音箱，2Boom BX320K，2Boom BX320K迷你低音King Jr蓝牙便携式户外音箱，2BOOMBX320K，2Boom，BX320K， 7.08192E + 11 ,, USD ,, 19.88,19.88 ,, http://www.example.com,http://www.ftjcfx.com/image-8299186-11133500,https://www.brandsmartusa.com/images/product/large/20225377.jpg,Bluetooth＆amp;对接扬声器,,,,,,,,,,,,,,,,,,,,,, BrandsMart USA，http://www.BrandsMartUSA.com,BrandsMart产品目录，01：30.8，Mini Bass King Jr蓝牙便携式户外音箱，2Boom BX320R，2Boom BX320R Mini Bass King Jr蓝牙便携式户外音箱，2BOOMBX320R，2Boom，BX320R，7.08192E + 11 ,, USD ,, 19.88,19.88 ,, http://www.example.com,http://www.ftjcfx.com/image-8299186-11133500,https://www.brandsmartusa.com/images/product/large/20225393.jpg,Bluetooth＆amp;对接扬声器,,,,,,,,,,,,,,,,,,,,,, BrandsMart USA，http://www.BrandsMartUSA.com,BrandsMart产品目录，01：30.8，蓝牙降噪耳塞式耳机，2Boom EPBT690B，2Boom EPBT690B蓝牙降噪耳塞式耳机，2BOOMEPBT690B，2Boom，EPBT690B，7.44751E + 11 ,, USD ,, 9.88 ，9.88 ,, http://www.example.com,http://www.ftjcfx.com/image-8299186-11133500,https://www.brandsmartusa.com/images/product/large/20225398.jpg,Earbuds,,,,,,,,,,,,,,,,yes ,,,

使用Python 3

import os, csv, time

csv_path = os.path.dirname(os.path.abspath(__file__))

row_list = []

# Appliances
category_list = {"Appliance Accessories": ['Air Conditioner Accessories', 'Air Purifier Accessories', "Coffee Maker Accessories",
                           'Dishwasher Accessories', 'Food Processor Accessories', 'Heater Accessories',
                           'Humidifier Accessories', 'Humidifier Accessories', 'Mixer Accessories',
                           'Range & Oven Accessories', 'Range Hood Accessories', 'Refrigerator Accessories',
                           'Vacuum Accessories', 'Washer & Dryer Accessories'],
 "Electronics Accessories": ['cables & adapters', 'audio accessories', 'video accessories', 'camcorder accessories',
                             'cell phone accessories', 'clock radios', 'Digital Book Reader Accessories',
                             'Digital Picture Frames', 'Electronics Cases & Bags', 'GPS Accessories',
                             'Projector Accessories', 'Telephone Accessories', 'batteries', 'battery'],
 "Photography": ['Camcorders', 'Cameras', 'Digital Camera Accessories', 'Digital Cameras', 'Camera', 'Digital Camera',
                 'Photography', 'Darkroom']}

category = ""

with open(csv_path + "/pre.csv") as f:
    reader = csv.reader(f)
    for row in reader:
        for k, val in category_list.items():
            for v in val:
                if v.lower() in row[4].lower():
                    category = k
                else:
                    category = "Uncategorized"


        new_row = [str(row[0]),  # company
                   str(row[1]),  # company url
                   str(row[4]),  # product name
                   str(row[5]),  # keywords
                   str(row[6]),  # descripition
                   str(row[7]),  # sku
                   str(row[8]),  # manufacturer
                   str(row[13]),  # saleprice
                   str(row[14]),  # price
                   str(row[15]),  # retailprice
                   str(row[17]),  # buy_link
                   str(row[19]),  # product_image_url
                   str(row[31]),  # promotional_text
                   str(row[36]),  # stock
                   str(row[37]),  # condition
                   str(row[38]),  # warrenty
                   str(row[39]),  # shipping_cost
                   category,
                   ]
        row_list.append(new_row)

    f.close()

with open(csv_path + "/final.csv", 'w') as ff:
    writer = csv.writer(ff)
    writer.writerows(row_list)
    ff.close()

Answer 1

好吧，这个块只会存储val中最后一个值的结果：

for v in val:
    if v.lower() in row[4].lower():
        category = k
    else:
        category = "Uncategorized"

最后只是比较val[-1]，因为你要覆盖category。

您可能希望在找到类别后中断循环，或者在每次迭代时使用此值执行某些操作？

Answer 2

好的，所以我从上面的每个人那里得到一点点来满足我的需求。谢谢所有帮助过的人。

csv_row_map =  [0,  # company
                1,  # company url
                4,  # product name
                5,  # keywords
                6,  # descripition
                7,  # sku
                8,  # manufacturer
                13,  # saleprice
                14,  # price
                15,  # retailprice
                17,  # buy_link
                19,  # product_image_url
                31,  # promotional_text
                36,  # stock
                37,  # condition
                38,  # warrenty
                39,  # shipping_cost
                ]

product_to_category_index = {}
for category, products in category_list.items():
    product_to_category_index.update(((product.lower(), category) for product in products))

with open(csv_path + '/pre.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        for k, v in product_to_category_index.items():
            if k in row[4].lower():
                category = v
                break
            else:
                category = "Uncategorized"
        #category = product_to_category_index.get(row[4].lower(), "Uncategorized")
        new_row = [row[csv_row_map[i]] for i in range(len(csv_row_map))]
        new_row.append(category)
        row_list.append(new_row)

with open(csv_path + "/final.csv", 'w') as ff:
    writer = csv.writer(ff)
    writer.writerows(row_list)

循环字典与意外返回

2 个答案: