我合并了4个数据集,我可以注意到数据框中的重复行。但是,当我命令熊猫向我显示重复的行时,它说没有,因此我删除重复行的代码没有响应。任何帮助将不胜感激。
数据帧示例:
end_time_x start_time_x duration deviceuuid time_offset_x exercise_type max_speed calorie mean_speed distance ... time_offset create_time weekday month startsleep wakeup sleep_duration duration_mins powernaps weekend
0 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
1 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 05:12:34.278000-04:00 6 1 0 4 04:12:00 252.0 False True
2 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-08 07:45:13.936000-04:00 6 1 22 7 09:11:00 551.0 False True
3 2018-01-07 10:01:00-04:00 2018-01-07 07:21:00-04:00 831210 F/D7+hL5E5 UTC-0300 1001 1.750000 54.340 1.376099 905.360 ... UTC-0400 2018-01-07 10:15:59.770000-04:00 6 1 7 10 02:40:00 160.0 False True
我尝试了下面的代码,但是如果我省略drop_duplicates行,它们也会产生相同的结果。
检查重复项的代码:
df_merged.duplicated().sum()
df_merged.loc[df_merged.duplicated(),:]
通过首先在4个数据帧中的2个中删除重复项来合并数据帧的代码:
df_exercise_cleaned=df_exercise.drop_duplicates()
df_HR_cleaned=df_HR.drop_duplicates() df_merged=df_exercise_cleaned.merge(df_HR_cleaned,on='date',how='inner').merge(df_FC, on='date',how='inner').merge(df_sleep,on='date',how='inner')