Pandas:在 Pandas 数据框中填充缺失的日期

时间:2021-03-20 10:05:47

标签: python pandas datetime strftime

如何填充日期列,以便在检测到日期时将该日期添加到下面的行中,直到它看到一个新日期开始添加该日期?

可重现的例子:

输入:


                Date                                           Headline
0   Mar-20-21 04:03AM  Apple CEO Cook, executives on tentative list o...
1             03:43AM  Apple CEO Cook, execs on tentative list of wit...
2   Mar-19-21 10:19PM  Dow Jones Futures: Why This Market Rally Is So...
3             06:13PM  Zuckerberg: Apples Privacy Move Could Spur Mor...
4             05:45PM  Apple (AAPL) Dips More Than Broader Markets: W...
5             04:17PM  Facebook Stock Jumps As Zuckerberg Changes Tun...
6             04:03PM  Best Dow Jones Stocks To Buy And Watch In Marc...
7             01:02PM  The Nasdaq's on the Rise Friday, and These 2 S...

期望的输出:


                 Date                                           Headline
0   Mar-20-21 04:03AM  Apple CEO Cook, executives on tentative list o...
1   Mar-20-21 03:43AM  Apple CEO Cook, execs on tentative list of wit...
2   Mar-19-21 10:19PM  Dow Jones Futures: Why This Market Rally Is So...
3   Mar-19-21 06:13PM  Zuckerberg: Apples Privacy Move Could Spur Mor...
4   Mar-19-21 05:45PM  Apple (AAPL) Dips More Than Broader Markets: W...
5   Mar-19-21 04:17PM  Facebook Stock Jumps As Zuckerberg Changes Tun...
6   Mar-19-21 04:03PM  Best Dow Jones Stocks To Buy And Watch In Marc...
7   Mar-19-21 01:02PM  The Nasdaq's on the Rise Friday, and These 2 S...

尝试:

df['Time'] = [x[-7:] for x in df['Date']]
df['Date'] = [x[:-7] for x in df['Date']]
# Some code that fills the date
# Then convert to datetime

1 个答案:

答案 0 :(得分:1)

在使用ffill()之前,需要将两列拆分以获得正确的时间,并且只填写Date部分。您需要将空格替换为 np.nan 才能使用 ffill()。然后将列重新组合在一起并将该操作包装在 pd.to_datetime 中以获得正确的 dtype

最后,您可以删除时间列。

# Imports
import numpy as np
import pandas as pd

# Split the column
df[['Date','Time']] = df['Date'].str.split(' ',expand=True)

# Replace space with nan and use ffill()
df['Date'] = df['Date'].replace(r'^\s*$', np.nan, regex=True).ffill()

# Put the columns back and convert to datetime
df['Date'] =  pd.to_datetime(df['Date'] + ' ' + df['Time'])

# Drop the time column
del(df['Time'])

会让你回来:

df
                 Date                                           Headline
0 2021-03-20 04:03:00  Apple CEO Cook, executives on tentative list o...
1 2021-03-20 03:43:00  Apple CEO Cook, execs on tentative list of wit...
2 2021-03-19 22:19:00  Dow Jones Futures: Why This Market Rally Is So...
3 2021-03-19 18:13:00  Zuckerberg: Apples Privacy Move Could Spur Mor...
4 2021-03-19 17:45:00  Apple (AAPL) Dips More Than Broader Markets: W...
5 2021-03-19 16:17:00  Facebook Stock Jumps As Zuckerberg Changes Tun...
6 2021-03-19 16:03:00  Best Dow Jones Stocks To Buy And Watch In Marc...
7 2021-03-19 13:02:00  The Nasdaq's on the Rise Friday, and These 2 S...

编辑 如果您希望您的“日期”完全按照您在所需结果中的方式显示,即这种格式“Mar-20-21”,请不要将其包装在 pd.to_datetime() 中并将其保留为 {{1} }:

object