我是Python的新手,我需要阅读一个csv文件,并以最低的价格保留重复的项目。 例如:
输入文件:
name, link, price, category
item1, http://example.com/item1, 29.30, cat1
item2, http://example.com/item2, 22, cat2
item1, http://example.com/item1, 19.90, cat1
输出文件:
name, link, price, category
item2, http://example.com/item2, 22, cat2
item1, http://example.com/item1, 19.90, cat1
到目前为止,这是我的代码:
f1 = csv.reader(open('input.csv', 'rb'), delimiter=',')
writer = csv.writer(open("output.csv", "wb"))
name = set()
for row in f1:
if row[0].lower() not in (i.lower() for i in name):
writer.writerow(row)
name.add(row[0])
我可以使用此代码删除重复项,但我需要帮助以保持项目的最低价格。
谢谢!
答案 0 :(得分:1)
您可以使用默认值为inf
的dict.get,检查存储为值的当前价格是否小于我们遇到的当前价格,然后相应地更新。最后用writerow编写从dict.items返回的元组。如果需要,我们还可以使用collections.Ordereddict保留文件顺序。
import csv
from collections import OrderedDict
d = OrderedDict() # keep the order
with open('in.csv', 'r') as f1, open("output.csv", "w") as out:
r = csv.reader(f1,delimiter=",")
header = next(r) # store header
writer = csv.writer(out,delimiter=",")
for row in r:
price = float(row[2])
# first check price will be less than than inf so we will add the key/value
if d.get(row[2], float("inf")) > price:
d[row[0]] = row
writer.writerow(header) # write header
for tup in d.values(): # write updated items
writer.writerow(tup)
输出:
name, link, price, category
item1, http://example.com/item1, 19.90, cat1
item2, http://example.com/item2, 22, cat2
如果订单无关紧要,请使用defaultdict和min:
import csv
from collections import defaultdict
d = defaultdict(list) # keep the order
with open('in.csv', 'r') as f1, open("output.csv", "w") as out:
r = csv.reader(f1,delimiter=",")
header = next(r) # store header
writer = csv.writer(out,delimiter=",")
for row in r:
d[row[0]].append(row)
writer.writerow(header) # write header
for k,v in d.items(): # write updated items
writer.writerow(min(v,key=lambda x:float(x[2])))
答案 1 :(得分:0)
您可以在mu提供的以下解决方案中使用dict.setdefault
缩短for循环。如果某个键不存在,dict.setdefault
会为键设置一个值,否则会保持该值不变。它返回当前值,无论是否更改为o。
for row in f1:
a = names.setdefault(row[0],row[1])
if row[1]<a:
names[row[0]] = row[1]
答案 2 :(得分:0)
这在熊猫中是微不足道的:
import pandas as pd
df = pd.read_csv('in_csv')
df.groupby('name').min()
答案 3 :(得分:0)
csv文件列为:name, link, price, category
import itertools, operator
data = list()
new_data = list()
name = operator.itemgetter(0)
name_price = operator.itemgetter(0,2)
将标题与数据分开。
with open('data.txt') as f:
header = f.next()
for line in f:
data.append(line.strip().split(','))
data
是一个列表清单 - [[name, link, price, category], ...]
首先在名称上排序data
,在第二位排序。
data.sort(key = name_price)
使用itertools.groupby
按名称分组,从每个组中取出第一项并对其进行格式化,然后将其保存到新列表中。
for key, group in itertools.groupby(data, name):
# the first item in the group has the lowest price
lowest_price = list(group)[0]
lowest_price = ','.join(lowest_price) + '\n'
new_data.append(lowest_price)
将header
和```new_data`写入文件。
with open('new_data.txt', 'wb') as f:
f.write(header)
f.writelines(new_data)
编辑以考虑更多字段。