associationRules.csv = #I'm only displaying some lines here for my case
,antecedents,consequents,confidence
19,"(LM = 20, SMOK = y)",(DIAB = n),0.5
20,(LM = 20),"(DIAB = n, SMOK = y)",0.5
21,"(DIAB = n, RCA = 85, LM = 15)",(SMOK = y),1.0
175,(RCA = 85),(LAD = 40),0.6666666666666667
176,(LAD = 40),(RCA = 85),1.0
177,"(DIAB = y, CHOL = 200, SMOK = y)",(LAD = 90),0.6666666666666667
178,"(DIAB = y, CHOL = 200, LAD = 90)",(SMOK = y),1.0
200,(LM = 20),"(RCA = 75, DIAB = n)",0.5
203,"(SEX = F, DIAB = y, SMOK = y)",(LM = 20),1.0
239,(CHOL = 200),"(DIAB = y, SMOK = y)",1.0
我正在遍历关联规则行,并且在以下情况下仅希望提取行: “先前”列中的数据集仅属于g1或g2。并且不属于y。意思是,只应提取行(175、176、203)。
y = ['CHOL = 200', 'LM = 20', 'LM = 25', 'LM = 30', 'LM = 15', 'LM = 35' ]
#g1 and g2 are the rest of other values of antecedents s.a: DIAB, RCA, LAD..etc
我的代码仅在len(antecedents)== 1时有效,而在len(antecedents)> 1时失败。
antecedents_list = []
for i, row in associationRules.iterrows():
antecedents = row.iloc[0]
flag1 = False
flag2 = False
single_antecedent = False
for j, v in enumerate(antecedents):
if len(antecedents) == 1 and (v not in y): #print single items
single_antecedent = True
elif len(antecedents) > 1 and (v not in y):
if v in g1:
flag1 = True
if v in g2:
flag2 = True
if single_antecedent or (flag1 and flag2):
antecedents_list.append(antecedents)
rules['antecedents'] = antecedents_list
我在做什么错?谁能帮忙
答案 0 :(得分:1)
如果您表示belongs to g1 or g2 only
和DOES NOT belong to y
,以及g1
g2
是y
中其余的其他值。我认为您可以检查是否有任何元素属于y。如果答案为否,则为您想要的列,例如(175, 176, 203)
。
此外,我认为这里不需要len(antecedents) == 1
的条件。您可以尝试以下方法:
antecedents_list = []
for i, row in associationRules.iterrows():
antecedents = row.iloc[0]
flag = True
for v in antecedents:
# belong to y, break out
if v in y:
flag = False
break
# or more pythonic way
# flag = all(v not in y for v in antecedents)
if flag:
antecedents_list.append(antecedents)
rules['antecedents'] = antecedents_list
无法调试自己,可以尝试一下。
如果您坚持使用自己的代码版本,那么我可以判断出哪里有问题:
if single_antecedent or (flag1 and flag2):
此处应更改为flag1 or flag2
希望对您有所帮助,如果还有其他问题,请发表评论。 :)