Question

我有一段持续时间，但值不同。有些持续时间只是时间格式，有些与日期混合。我希望持续时间列在总秒数。我尝试使用to_datetime和parse_date方法转换列，但它无法正常工作。如何在熊猫中做到这一点？这是专栏：

Answer 1

一种方法是将pd.Series.apply与try / except子句一起使用，该子句按顺序尝试每种方法。

此方法的好处是它可以接受timedelta和datetime的各种潜在输入。

import pandas as pd, numpy as np

df = pd.DataFrame({'Mixed': ['03:59:49', '1904-01-01 04:06:08']})

def return_seconds(x):
    try:
        return pd.to_timedelta(x).total_seconds()
    except:
        try:
            dt = pd.to_datetime(x)
            return (dt - dt.normalize()).total_seconds()
        except:
            return np.nan

df['TotalSeconds'] = df['Mixed'].apply(return_seconds).astype(int)

print(df)

#                  Mixed  TotalSeconds
# 0             03:59:49         14389
# 1  1904-01-01 04:06:08         14768

Answer 2

过滤最后8个值，转换to_timedelta，然后使用total_seconds：

df = pd.DataFrame({'col':['03:59:49', '1904-01-01 04:06:08']})

df['new'] = pd.to_timedelta(df['col'].str[-8:]).dt.total_seconds().astype(int)
print (df)
                   col    new
0             03:59:49  14389
1  1904-01-01 04:06:08  14768

编辑：

df['new'] = pd.to_timedelta(pd.to_datetime(df['col']).dt.strftime('%H:%M:%S')).dt.total_seconds().astype(int)

Answer 3

使用正则表达式：

import pandas as pd
df = pd.DataFrame({"a": ["03:59:49", "04:59:49", "1904-01-01 05:59:49", "1904-01-01 06:59:49"]})
df["TotalSeconds"]  = pd.to_timedelta(df["a"].str.extract('(\d{2}:\d{2}:\d{2})')).dt.total_seconds()
print(df)

<强>输出：

                     a  TotalSeconds
0             03:59:49       14389.0
1             04:59:49       17989.0
2  1904-01-01 05:59:49       21589.0
3  1904-01-01 06:59:49       25189.0

如何在python pandas中将不规则日期时间转换为总秒数

3 个答案: