Question

我有一个熊猫数据框：

id       age
001      1 hour
002      2 hours
003      2 days
004      4 days

年龄是指项目已在数据库中保留多长时间。我想做的是在将项目添加到数据库时打印日期。

因此，如果 age 列中包含字符串“ hour”或“ hours”，我想打印当前日期，如果不是，则用天数减去当前日期。

所需的输出应如下所示：

id       age          insertion_date
001      1 hour       2018-09-18
002      2 hours      2018-09-18
003      2 days       2018-09-16
004      4 days       2018-09-14

我正在使用Python 2.7，到目前为止，这是我所实现的。

import pandas as pd
from datetime import date

for index, row in df.iterrows():
    age = row["age"]
    if "days" in age:
       # Remove days and convert data type of age column 
       df["age"] = df["age"].astype("str").str.replace('[^\d\.]', '')
       # deduct current date by number of days
       df["insertion_date"] = df["age"].astype("int64").apply(lambda x: date.today() - timedelta(x))
     else:
       # print current date
       df["insertion_date"] = date.today()

上面代码的输出如下：

id       age          insertion_date
001      1            2018-09-17
002      2            2018-09-16
003      2            2018-09-16
004      4            2018-09-14

此代码的问题是，即使 age 列中存在字符串“ hour”或“ hours”，也不会将当前日期添加到 inserttion_date 列。

如果有人可以指出我的代码出了问题的地方，以便我可以对其进行修复以获取所需的输出，即会将字符串“ hour”添加到 insertion_date 列中，则将为您添加当前日期，或<小时>出现在年龄列中，否则，将当前日期减去到年龄列中的天数，然后将日期添加到 inserttion_date 列。

Answer 1

让我们做一点时间算术：

df['insertion_date'] = (
    pd.to_datetime('today') - pd.to_timedelta(df.age).dt.floor('D')).dt.date

df
   id      age insertion_date
0   1   1 hour     2018-09-18
1   2  2 hours     2018-09-18
2   3   2 days     2018-09-16
3   4   4 days     2018-09-14

Answer 2

您可以使用Timestamp.floor减去to_timedelta和TimedeltaIndex.floor创建的window.myHelperMethod = function () { console.log('ayo, this works!') }：

timedelta

根据熊猫中的字符串列获取日期

2 个答案: