我有一个Python Pandas数据帧,它由timedelta64类型的dueDate字段组成。我还在数据框中有一个列,它是一个布尔值,无论任务是否完成。
from datetime import date
df = pd.DataFrame(data = [pd.date_range('1/1/2017', periods = 6), [True if i%2 == 0 else False for i in range(6)]]).T
df.columns = ['dueDate', 'completed']
如果任务未完成且截止日期已过(大于今天),我想在名为daysLate的列中填充一行来存储该值。否则,我希望该行为NaN。
除了遍历每一行并应用多个if语句之外,我想知道是否有一种优雅或最佳实践方式来解决这样的问题?
答案 0 :(得分:1)
import numpy as np
import pandas as pd
df = pd.DataFrame(data=[pd.date_range('1/1/2017', periods=10), [True if i % 2 == 0 else False for i in range(10)]]).T
df.columns = ['dueDate', 'completed']
df['daysLate'] = np.nan
df['daysLate'][(df.dueDate > pd.to_datetime('today')) & (df.completed != True)] = df.dueDate - pd.to_datetime('today')
print(df)
这是你要找的吗?
dueDate completed daysLate
0 2017-01-01 00:00:00 True NaT
1 2017-01-02 00:00:00 False NaT
2 2017-01-03 00:00:00 True NaT
3 2017-01-04 00:00:00 False NaT
4 2017-01-05 00:00:00 True NaT
5 2017-01-06 00:00:00 False NaT
6 2017-01-07 00:00:00 True NaT
7 2017-01-08 00:00:00 False 2 days
8 2017-01-09 00:00:00 True NaT
9 2017-01-10 00:00:00 False 4 days
实际上,如果您对NaT
感到满意,可以跳过导入numpy
并将df['daysLate'] = np.nan
更改为df['daysLate'] = np.NaT