Question

我有一个DataFrame，其中一列存储日期。

但是，其中一些日期的格式正确，如'2018-12-24 17:00:00'之类的日期时间对象，而另一些则不是'20181225'之类的，并且存储方式类似。

当我尝试使用plotly绘制这些图形时，格式不正确的值变成了EPOCH日期，这是一个问题。

有什么办法可以使DataFrame的副本只有那些日期格式正确的行吗？

我尝试使用

clean_dict= dailySum_df.where(dailySum_df[isinstance(dailySum_df['time'],datetime.datetime)])

方法，但由于“数组条件必须与自身的形状相同”错误而无法正常工作。

    dailySum_df = pd.DataFrame(list(cursors['dailySum']))

    trace = go.Scatter(
        x=dailySum_df['time'],
        y=dailySum_df['countMessageIn']

    )
    data = [trace]
    py.plot(data, filename='basic-line')

Answer 1

尝试使用dateutil.parser.parse和Pandas apply函数解析数据框的日期列。

Answer 2

应用dateutil.parser，另请参阅我的答案here：

import dateutil.parser as dparser
def myparser(x):
    try:
       return dparser.parse(x)
    except:
       return None

df = pd.DataFrame( {'time': ['2018-12-24 17:00:00', '20181225', 'no date at all'], 'countMessageIn': [1,2,3]})
df.time = df.time.apply(myparser)
df = df[df.time.notnull()]

输入：

                  time  countMessageIn
0  2018-12-24 17:00:00               1
1             20181225               2
2       no date at all               3

输出：

                 time  countMessageIn
0 2018-12-24 17:00:00               1
1 2018-12-25 00:00:00               2

与Gustavo的解决方案不同，它可以处理完全没有可识别日期的行，并且可以根据您的问题过滤掉此类行。

如果您的原始时间列可能除了日期本身之外还包含其他文本，请添加fuzzy=True参数，如here所示。

在Python DataFrame中过滤掉格式不正确的日期时间值

2 个答案: