嗨,我的代码如下
def check_for_leakage(df1, df2, patient_col):
df1_patients_unique = set(df1.patient_col.unique())
df2_patients_unique = set(df2.patient_col.unique())
patients_in_both_groups = list(df1_patients_unique.intersection(df2_patients_unique))
leakage = len(patients_in_both_groups) > 0 # boolean (true if there is at least 1 patient in both groups)
return leakage
当我跑步
# test
print("test case 1")
df1 = pd.DataFrame({'patient_id': [0, 1, 2]})
df2 = pd.DataFrame({'patient_id': [2, 3, 4]})
print("df1")
print(df1)
print("df2")
print(df2)
print(f"leakage output: {check_for_leakage(df1, df2, 'patient_id')}")
我收到以下错误:
AttributeError:“ DataFrame”对象没有属性“ Patient_col”
我已经尝试了几件事,但是我不明白如何解决这个问题。对于我的问题,我也找不到合适的答案。
答案 0 :(得分:0)
您必须在方括号中调用列名称:
df1_patients_unique = set(df1[patient_col].unique())
df2_patients_unique = set(df2[patient_col].unique())
带有df1.column的表示法仅适用于实际的列名。您不能在此处输入变量。
答案 1 :(得分:0)
对功能进行以下更改:
def check_for_leakage(df1, df2, patient_col):
df1_patients_unique = set(df1[patient_col].unique())
df2_patients_unique = set(df2[patient_col].unique())
patients_in_both_groups = list(df1_patients_unique.intersection(df2_patients_unique))
leakage = len(patients_in_both_groups) > 0 # boolean (true if there is at least 1 patient in both groups)
return leakage
print("test case 1")
df1 = pd.DataFrame({'patient_id': [0, 1, 2]})
df2 = pd.DataFrame({'patient_id': [2, 3, 4]})
print("df1")
print(df1)
print("df2")
print(df2)
print(f"leakage output: {check_for_leakage(df1, df2, 'patient_id')}")
Output:
test case 1
df1
patient_id
0 0
1 1
2 2
df2
patient_id
0 2
1 3
2 4
leakage output: True