熊猫日期时间对象的奇怪行为

时间:2021-06-07 15:36:39

标签: python pandas numpy datetime data-processing

我试图证明我的问题是什么。我真的不明白,为什么PyNative <class 'datetime.datetime'> 对象被替换为Pandas 自定义对象<class 'pandas._libs.tslibs.timestamps.Timestamp'>

import typing
from dateutil.parser import parse

def _normalize_users_dataframe(row: pd.core.series.Series) -> pd.core.series.Series:
    last_seen: typing.Union[str, datetime.datetime] = row.get('last_seen', '')
    if last_seen:
        last_seen = parse(last_seen)
        row['last_seen'] = last_seen
        print(row['last_seen'][0].__class__.__mro__) # This shows me that, it is <class 'datetime.datetime'> object, which is PyNative datetime.
    return row

def process_users_dataframe(filepath: str) -> pd.core.frame.DataFrame:
    df: pd.core.frame.DataFrame = pd.read_csv(filepath, sep='\t')
    df.rename(columns=mapping, inplace=True)
    df.replace({np.nan: None}, inplace=True)
    df = df.apply(_normalize_users_dataframe, axis=1)
    print(row['last_seen'][0].__class__.__mro__) # This shows me that, it is <class 'pandas._libs.tslibs.timestamps.Timestamp'>, which is `Pandas` specific object.
    return df


def main() -> None:
    process_users_dataframe('<dir>')

normalize_users_dataframe() 函数中,当我尝试 print last_seen 列系列时,它显示 dtype<class 'datetime.datetime'>,这很好,但是在 apply() 上运行 DataFrame 方法返回新的 DataFrame 对象后,last_seen dtype 变为 <class 'pandas._libs.tslibs.timestamps.Timestamp'>

这是怎么发生的?也许深入的实现细节?

0 个答案:

没有答案