我正在尝试实现Apriori关联规则挖掘算法。我切换到使用生成器来创建候选项目集对。尝试创建组合时,出现“ TypeError:'int'对象不可下标”
这是订单数据框的示例
https://puu.sh/DdJSj/0b6401efac.png
from collections import Counter
from itertools import groupby, combinations
import pandas
#now we will use a generator instead of dicts to save memory
def generate_pairs(orders, k):
#generate item list for order
for id, order in groupby(orders, lambda x: x[0]):
items = [item[0] for item in order]
#generate pairs for each itemlist
for pair in combinations(items, k):
yield pair
def itemcount(iterable):
if type(iterable) == pandas.core.series.Series:
return iterable.value_counts().rename("count")
else:
return pandas.Series(Counter(iterable)).rename("count")
pair_generator = generate_pairs(orders, 2)
print(pair_generator)
pairs = itemcount(pair_generator).to_frame("count(AB)")
产生
Traceback (most recent call last):
File "C:/Users/Cosco/PycharmProjects/untitled/finalp/final.py", line 183, in <module>
rules = generate_rules(transactions, supp_percent)
File "C:/Users/Cosco/PycharmProjects/untitled/finalp/final.py", line 80, in generate_rules
pairs = itemcount(pair_generator).to_frame("count(AB)")
File "C:/Users/Cosco/PycharmProjects/untitled/finalp/final.py", line 33, in itemcount
print(type(pandas.Series(Counter(iterable)).rename("count")))
File "C:\Users\Cosco\Miniconda3\lib\collections\__init__.py", line 534, in __init__
self.update(*args, **kwds)
File "C:\Users\Cosco\Miniconda3\lib\collections\__init__.py", line 621, in update
_count_elements(self, iterable)
File "C:/Users/Cosco/PycharmProjects/untitled/finalp/final.py", line 22, in generate_pairs
for id, order in groupby(orders, lambda x: x[0]):
File "C:/Users/Cosco/PycharmProjects/untitled/finalp/final.py", line 22, in <lambda>
for id, order in groupby(orders, lambda x: x[0]):
TypeError: 'int' object is not subscriptable
我在做什么错?我知道x应该是可迭代的,但是当我调试时,x是单个item_id
编辑:当generate_pairs()更改如下时,生成器工作(错误):
def generate_pairs(orders, k):
orders = orders.reset_index().values
#generate item list for order
for id, order in groupby(orders, lambda x: x[0]):
itemlist = [item[1] for item in order]
#generate pairs for each itemlist
for pair in combinations(itemlist, k):
yield pair
答案 0 :(得分:0)
您假设熊猫DataFrames
像列表一样工作,但它们却不一样。
您可以这样修改程序:
def generate_pairs(orders, k):
orders = orders.values.tolist()
...
但是请记住,您将无法访问generate_pairs
内的标签或格式。
注意:您也可以摆脱orders = orders.values
的束缚-这样可以避免O(n)
复制数据(从numpy到PyList)的问题,但是如果您期望orders
的类型完全是一个列表。