我正在分析同事的汽车交换平台的一些交易数据。我已经使该程序对数据进行了一些操作,但是我想知道我是否可以提高性能(时间和空间要求),因为他将查看大量数据。
程序读取已提交的汽车上的规格数据(大小,颜色等)以进行交换(cars.csv),交易数据(trans.csv)和包含交换规则的JSON文件。这些规则包括实物汽车规格规则(汽车必须具有一定的大小,而不是特定的颜色)和交易规则(对于未通过规格测试的汽车进行的交易将被忽略,每笔交易的最大值)
通常,程序应执行以下操作:
下面是该程序的简单概述,并且工作正常。但是,我想知道你们是否看到任何性能瓶颈(例如,大量的汽车规格或交易数据)
谢谢
import pandas as pd
# read in car data and exchange rules
cars = pd.read_csv("cars.csv")
rules = pd.read_json("rules.json")
# combine car rules into test
test = ((cars["Size"] == rules["CarSpecs"][0]["AllowedSize"][0]) \
| (cars["Size"] == rules["CarSpecs"][0]["AllowedSize"][1])) \
& (cars["Color"] != rules["CarSpecs"][1]["ForbiddenColor"][0])
# filter cars with test
carsAllowed = cars[test]
# read exchange transaction data
transactions = pd.read_csv("transac.csv")
# apply exchange transaction rules(i.e. transaction value cannot be above some max value and carId from transaction should be in dataframe carsAllowed)
transactions = transactions[transactions["CarId"].isin(carsAllowed["CarId"])]
transactions["transVal"] = transactions["transVal"].apply(lambda transVal: rules["MaxTransVal"][0] if transVal>=rules["MaxTransVal"][0] else transVal)
# finally, group transactions by car colors and calculate total transaction value for each
# first take car color information from carsAllowed as it is not included in transaction data
transactions["Color"] = transactions["CarId"].apply(lambda carId: carsAllowed[carsAllowed["CarId"] == carId]["Color"].item())
transactions.groupby("Color")["transVal"].sum()