我有2个.csv
文件,它们具有数千行数据(来自供应商的产品库存)。我需要找到重复的商品并删除价格较高的商品。
问题是价格包含小数。以下代码是我根据需要完成的最接近的代码:
with open('vendor1.csv', 'r') as venOne, open('vendor2.csv', 'r') as venTwo, open('filtered.csv', 'w') as outFile:
z = csv.reader(venOne, delimiter = ',')
m = csv.reader(venTwo, delimiter = ',')
w = csv.writer(outFile, delimiter = ',')
zupc = {row[5] for row in z} #UPC is in column 5
mupc = {row[5] for row in m}
zprice = {row[9] for row in z} #Price is in column 9
mprice = {row[7] for row in m} #Price is in column 7
for row in z:
if row[5] in mupc and row[9] < mprice:
w.writerow(row)
else:
if row[5] not in mupc:
w.writerow(row)
#Do the same for m
我正在使用Python 2.x
。
最后,这将使用cron
作业来运行。所有数据都在远程共享服务器上。
一个警告是我无法使用pandas
(使用我编写的其他各种脚本可以节省很多时间)。唯一可用的导入模块是python的标准导入模块,并且添加附加模块是不可能的(也就是说,无需花费更多的金钱来升级到专用服务器)。
答案 0 :(得分:0)
首先,您应该使用dict
而不是set
。关于价格,您可以尝试将其投放到decimal
。
尝试以下代码,让我知道是否有帮助:
from decimal import Decimal
def write_cheaper_items(output, rows, this_prices, other_prices):
for row in rows:
upc = row[5]
if upc not in other_prices or this_prices[upc] < other_prices[upc]:
output.writerow(row)
with open('vendor1.csv', 'r') as venOne, open('vendor2.csv', 'r') as venTwo, open('filtered.csv', 'w') as outFile:
z = csv.reader(venOne, delimiter = ',')
m = csv.reader(venTwo, delimiter = ',')
w = csv.writer(outFile, delimiter = ',')
# these dicts will have the UPC as keys and their prices as values
z_prices = {
row[5]: Decimal(row[9])
for row in z}
m_prices = {
row[5]: Decimal(row[7])
for row in m}
write_cheaper_items(w, z, z_prices, m_prices)
write_cheaper_items(w, m, m_prices, z_prices)