我正在使用以下代码构建聚合:
import numpy
import pandas
orders = pandas.read_csv(
"orders.csv",
dtype={
"order_id": numpy.int32,
"user_id": numpy.int32,
"eval_set": "category",
"order_number": numpy.int8,
"order_dow": numpy.int8,
"order_hour_of_day": numpy.int8,
"days_since_prior_order": numpy.float64
}
)
orders.set_index('order_id', inplace=True, drop=False)
prior_order_products = pandas.read_csv(
"order_products__prior.csv",
dtype={
"order_id": numpy.int32,
"product_id": numpy.int32,
"add_to_cart_order": numpy.int16,
"reordered": numpy.int8
}
)
prior_order_products.set_index(['order_id', 'product_id'], inplace=True, drop=False)
prior_order_products = prior_order_products.join(orders, how="inner", on='order_id', rsuffix='_')
prior_order_products.drop('order_id_', inplace=True, axis=1)
del orders
prior_order_products['user_product_id'] =\
100000 * prior_order_products["user_id"].astype(numpy.int64) + prior_order_products["product_id"]
user_products = prior_order_products.\
groupby('user_product_id', sort=False).\
agg({'order_id': ['size', 'last'], 'add_to_cart_order': 'sum'})
它出现以下错误:
Traceback (most recent call last):
File "C:/Users/Strategy/PycharmProjects/Test/Main.py", line 52, in <module>
agg({'order_id': ['size', 'last'], 'add_to_cart_order': 'sum'})
...
TypeError: '<' not supported between instances of 'numpy.ndarray' and 'str'
如果我对该行发表评论
,我可以将错误消除prior_order_products.set_index(['order_id', 'product_id'], inplace=True, drop=False)
另外,如果我将读取的行数限制为prior_order_products
,我可以解决错误。文件格式不正确,没有数据丢失或格式错误。
错误究竟意味着什么?它与prior_order_products
上的索引有什么关系?它与prior_order_products
中的行数有什么关系?