我有一个熊猫数据框,可以基于多列值提取某些行。
代码以提取行,其中“ folder == True”列和“ depth == 1”列
folders = df[(df["folder"] == True) & (df['depth'] == 1)]
文件夹数据框
id path mtime ctime folder num_files depth
17 2 \\fileserver\bckup\admin 2020-07-10 16:36:58 2020-07-10 16:17:33 True 16.0 1
19 20 \\fileserver\bckup\test 2020-07-10 16:19:33 2020-07-10 16:17:46 True 1.0 1
对于文件夹数据框,我想选择每行的路径和ctime值,并根据当前日期计算ctime,如果它超过X天数,则删除路径。我在遍历路径和ctime的数据帧时遇到困难,您能建议吗?
谢谢
答案 0 :(得分:1)
假设下面的df
是您的folder
数据框,您可以这样做:
# todays date
today = pd.Timestamp('today')
# no. of days
x = 6
df['days_diff'] = (today - df['ctime']).dt.days
# set path to None days_diff > x
m = df['days_diff'].gt(x)
df.loc[m, 'path'] = None
cols = ['path', 'ctime', 'days_diff']
print(df[cols])
path ctime days_diff
0 \\fileserver\bckup\admin 2020-07-10 16:17:33 5
1 \\fileserver\bckup\test 2020-07-10 16:17:46 5