我有一个名为test的数据框,看起来像这样
+-------+---------+---------+---------+------------+
| | Term 1 | Term 2 | Term 3 | Final Exam |
+-------+---------+---------+---------+------------+
| 1288 | 0 | 0 | 1 | 1 |
| 1290 | 1 | 1 | 1 | 1 |
| 1294 | 0 | 0 | 1 | 1 |
| 1296 | 1 | 1 | 1 | 1 |
| 1297 | 1 | 1 | 1 | 1 |
| 1304 | 0 | 1 | 1 | 1 |
| 1308 | 0 | 0 | 1 | 1 |
| 1324 | 1 | 1 | 1 | 1 |
| 1325 | 1 | 1 | 1 | 1 |
| 1332 | 1 | 1 | 1 | 1 |
+-------+---------+---------+---------+------------+
我想要一个所有唯一组合的汇总表,其中column = 1及其出现的次数:
+-----------------------------------+-----------+
| Combination | Frequency |
+-----------------------------------+-----------+
| Term 3, Final Exam | 3 |
| Term 2, Term 3, Final Exam | 1 |
| Term 1, Term2, Term 3, Final Exam | 6 |
+-----------------------------------+-----------+
我尝试使用mlxtend.apriori,但这使我出现了所有列在一起:
from mlxtend.frequent_patterns import apriori
results = apriori(test,min_support=0.00001,use_colnames=True)
results['length'] = results['itemsets'].apply(lambda x:len(x))
numberofcases = test.shape[0]
results['Frequency'] = results['support'] * numberofcases
results['Terms'] = results['itemsets'].astype(str).str.replace('frozenset\({','').str.replace('}\)','').str.replace('\'','').str.replace('\"','')
results[results['length'] > 1][['Terms','Frequency']]
结果集:
+-----+-------------------------------------+-----------+
| | Terms | Frequency |
+-----+-------------------------------------+-----------+
| 4 | Term 2, Term 1 | 6.0 |
| 5 | Term 3, Term 1 | 6.0 |
| 6 | Final Exam, Term 1 | 6.0 |
| 7 | Term 2, Term 3 | 7.0 |
| 8 | Term 2, Final Exam | 7.0 |
| 9 | Term 3, Final Exam | 10.0 |
| 10 | Term 2, Term 3, Term 1 | 6.0 |
| 11 | Term 2, Final Exam, Term 1 | 6.0 |
| 12 | Term 3, Final Exam, Term 1 | 6.0 |
| 13 | Term 2, Term 3, Final Exam | 7.0 |
| 14 | Term 2, Term 3, Final Exam, Term 1 | 6.0 |
+-----+-------------------------------------+-----------+
先验中是否有一些参数可以产生期望的结果?
答案 0 :(得分:2)
使用dot
和value_counts
df.dot(df.columns+',').str[:-1].value_counts()
Out[419]:
Term1,Term2,Term3,FinalExam 6
Term3,FinalExam 3
Term2,Term3,FinalExam 1
dtype: int64