我有一个数据框
atm_id cardreader_error_by_day hours_since_last_cdm_error avg_cdm_dispense_count_distinct_day status_count avg_cdm_delivery_status_null related_to_status avg_cardreader_error avg_reply_count_distinct_day avg_session_error_count ... status_to_transactions_gr5 avg_bna_accept_count_gr5 avg_receipt_confirm_count_distinct_day_gr5 days_since_last_cardreader_error_gr5 avg_cdm_delivery_status_not_taken_gr5 distinct_days_gr5 last_error_bna_relative_gr5 avg_mcrw_error_gr5 target period
4481 249 0.142857 336.000000 0.728737 20 0.271263 0.100000 0.1 1.082145 0.050185 ... 0.458297 -1.0 -0.023840 -0.142857 -1.000000 -0.238076 -1.0 -1.0 1 13
27 1 0.000000 336.000000 0.701198 15 0.298802 0.000000 0.0 1.284708 0.069767 ... 0.096547 -1.0 -0.113377 0.000000 -1.000000 -0.088046 -1.0 -1.0 1 15
3338 185 0.000000 498.138333 1.231075 36 0.385390 0.111111 0.0 1.746520 0.039507 ... 0.028924 -1.0 0.200206 0.000000 0.943524 0.029059 -1.0 -1.0 1 20
我需要计算df
和df['target']
中所有组合之间的相关性
我尝试使用
for i in range(1, len(df.columns)+1):
for subset in itertools.combinations(df.columns, i):
if (df[list(subset)]).corr(df['target']) > 0.4:
print('%s correlation: %s' % (list(subset), df[list(subset)].corr(df['target'])))
但是它返回一个错误
TypeError: invalid type comparison
我该如何解决?