我有一个如下的csv
SKU;price;availability;Title;Supplier
SUV500;21,50 €;1;27-03-2019 14:46;supplier1
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
SUV500;49,95 €;0;27-03-2019 14:46;supplier2
MZ-76E;71,25 €;0;27-03-2019 14:46;supplier2
SUV500;32,60 €;1;27-03-2019 14:46;supplier3
我正在尝试将具有以下内容的csv作为输出
SKU;price;availability;Title;Supplier
SUV500;21,50 €;1;27-03-2019 14:46;supplier1
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
我想在每个SKU上仅获得价格最低的记录
因为我完全迷失了熊猫,该怎么办?与古典,如果?带有列表集?
有什么想法吗?
答案 0 :(得分:2)
在熊猫中,您可以执行以下操作
import pandas as pd
df= pd.read_csv('your file')
正如安迪(Andy)在下面指出的那样,这仅返回价格和SKU列
df_reduced= df.groupby('SKU')['price'].min()
对于所有列,您可以将groupby更改为要保留的所有列的列表
df_reduced= df.groupby(['SKU', 'availability', 'Title', 'Supplier'])['price'].min()
答案 1 :(得分:1)
这里没有使用熊猫的真正需要。这可能不是最优解决方案,但可能是我的解决方案:
import csv
class Product:
def __init__(self, sku, price, availability, title, supplier):
self.sku = sku
self.price = float(price.replace(',', '.')[:-2]) # allows sorting
self.availability = availability
self.title = title
self.supplier = supplier
unparsed_products = []
with open('name_of_csv.csv', 'r') as csvfile:
csv_reader = csv.reader(csvfile, delimiter=';')
next(csv_reader) # to skip past header line when parsing.
for row in csv_reader:
p = Product(*row)
unparsed_products.append(p)
suv500_products = [i for i in unparsed_products if i.sku == 'SUV500']
lowest_priced_suv500_product = sorted(suv500_products, key=lambda x: x.price, reverse=True)[0] # gets the first entry from the sorted list of suv500_products
print(lowest_priced_suv500_product.price)
>>> 21.50
通过更改if i.sku == X
中X的值,您可以轻松地将此扩展到其他产品。
答案 2 :(得分:1)
非熊猫解决方案,可以获取所需的输出。
编辑:将csv编写器添加到解决方案
编辑:仅接受row[2]
处具有'1'的记录
from collections import defaultdict
import re
from operator import itemgetter
import csv
fin = open('SKU_csv.csv', 'r', encoding="utf8")
csv_reader = csv.reader(fin, delimiter=';')
fout = open('test_out.csv', 'w', newline = '')
csv_writer = csv.writer(fout, delimiter=';')
csv_writer.writerow(next(csv_reader)) # print header
d = defaultdict(list)
for row in csv_reader:
if int(row[2]) != 1:
continue
key = row[0]
val = row[1].replace(',', '.')
price = float(re.search('\d+\.\d+', val).group(0))
d[key].append([row, price])
fin.close()
for arr in d.values():
minimum, _ = min(arr, key=itemgetter(1)) # minimum price (at arr idx 1)
csv_writer.writerow(minimum)
fout.close()
'''
*** test_out.csv contents
SKU;price;availability;Title;Supplier
SUV500;21,50 €;1;27-03-2019 14:46;supplier1
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
'''
答案 3 :(得分:1)
已编辑:采用先前的混淆假设
从csv文件读取后
In [8]: df = pd.read_csv(filename, delimiter=';', encoding='utf-8')
In [9]: df
Out[9]:
SKU price availability Title Supplier
0 SUV500 21,50 € 1 27-03-2019 14:46 supplier1
1 MZ-76E 5,50 € 1 27-03-2019 14:46 supplier1
2 SUV500 49,95 € 0 27-03-2019 14:46 supplier2
3 MZ-76E 71,25 € 0 27-03-2019 14:46 supplier2
4 SUV500 32,60 € 1 27-03-2019 14:46 supplier3
添加新列以保存price
的浮点值
In [12]: df['f_price'] = df['price'].str.extract(r'([+-]?\d+\,\d+)', expand=False).str.replace(',', '.').astype(float)
#Note: if your locality using denotion `,` for decimal point, you don't need additional `str.replace`. Just use below
#df['f_price'] = df['price'].str.extract(r'([+-]?\d+\,\d+)', expand=True).astype(float)
In [13]: df
Out[13]:
SKU price availability Title Supplier f_price
0 SUV500 21,50 € 1 27-03-2019 14:46 supplier1 21.50
1 MZ-76E 5,50 € 1 27-03-2019 14:46 supplier1 5.50
2 SUV500 49,95 € 0 27-03-2019 14:46 supplier2 49.95
3 MZ-76E 71,25 € 0 27-03-2019 14:46 supplier2 71.25
4 SUV500 32,60 € 1 27-03-2019 14:46 supplier3 32.60
从groupby获取每组的最低价(f_price)列表
In [28]: idxmin_list = df.groupby('SKU', as_index=False)['f_price'].idxmin().tolist()
In [29]: idxmin_list
Out[29]: [1, 0]
最后,将idxmin_list
传递到df
并放下f_price
列以获得最终结果
In [33]: df_final = df.loc[idxmin_list].drop('f_price', 1)
In [34]: df_final
Out[34]:
SKU price availability Title Supplier
1 MZ-76E 5,50 € 1 27-03-2019 14:46 supplier1
0 SUV500 21,50 € 1 27-03-2019 14:46 supplier1
写入csv文件
In [65]: df_final.to_csv('Sku_min.csv', sep=';', index=False)
文件Sku_min.csv
在您的工作文件夹中创建,其内容为
SKU;price;availability;Title;Supplier
MZ-76E;5,50 €;1;27-03-2019 14:46;supplier1
SUV500;21,50 €;1;27-03-2019 14:46;supplier1