Question

我正在从Dataframe解析日期，其中数据来自CSV文件。我得到了上述错误。我确信格式是正确的。

我的代码：

import pandas as pd
    from datetime import datetime
    import csv
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    headers = ['Sensor Value','Date','Time']
    df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV',names=headers)
    print (df)

    df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
    x = df['Date']
    y = df['Sensor Value']

    # plot
    plt.plot(x,y)
    # beautify the x-labels
    plt.gcf().autofmt_xdate()

    plt.show()

我的DataFrame：

0    Sensor Value         Date           Time
1               2   2017/02/17   19:06:17.188
2              72   2017/02/17   19:06:22.360
3              72   2017/02/17   19:06:27.348

控制台错误：

   new_values = map_f(values, arg)
  File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer     (pandas\lib.c:66124)
  File "C:/Users/Lala Rushan/PycharmProjects/newgraph/newgraph.py", line 10, in <lambda>
  df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
  File "C:\Users\Lala    Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 500, in _strptime_datetime
  tt, fraction = _strptime(data_string, format)
  File "C:\Users\Lala   Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 337, in _strptime
  (data_string, format))
 ValueError: time data 'Date' does not match format '%Y/%m/%d %H:%M:%S.%f'

CSV输入：

    Sensor Value    Date    Time
2    2017/02/17  19:06:17.188
72   2017/02/17  19:06:22.360
72   2017/02/17  19:06:27.348
72   2017/02/17  19:06:32.482
74   2017/02/17  19:06:37.515
70   2017/02/17  19:06:42.580

Answer 1

对于errors='coerce'解析有问题的值，我认为您需要to_datetime参数NaN：

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

检查有问题的行：

print (df[pd.to_datetime(df['Date'], errors='coerce').isnull()])

但是如果需要阅读date列和datetime的时间，请使用read_csv中的参数parse_dates：

import pandas as pd
from pandas.compat import StringIO

temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/17,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
                 Datetime  Sensor Value
0 2017-02-17 19:06:17.188             2
1 2017-02-17 19:06:22.360            72
2 2017-02-17 19:06:27.348            72
3 2017-02-17 19:06:32.482            72
4 2017-02-17 19:06:37.515            74
5 2017-02-17 19:06:42.580            70

print (df.dtypes)
Datetime        datetime64[ns]
Sensor Value             int64
dtype: object

第一个解决方案与第二个解决方案相结合，其中最后一行是30.2.2017不存在的内容：

temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/30,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
                  Datetime  Sensor Value
0  2017/02/17 19:06:17.188             2
1  2017/02/17 19:06:22.360            72
2  2017/02/17 19:06:27.348            72
3  2017/02/17 19:06:32.482            72
4  2017/02/17 19:06:37.515            74
5  2017/02/30 19:06:42.580            70

df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce')
print (df)
                 Datetime  Sensor Value
0 2017-02-17 19:06:17.188             2
1 2017-02-17 19:06:22.360            72
2 2017-02-17 19:06:27.348            72
3 2017-02-17 19:06:32.482            72
4 2017-02-17 19:06:37.515            74
5                     NaT            70 <- replace 30.2.2017 to NaT (same as NaN for dates)

print (df.dtypes)
Datetime        datetime64[ns]
Sensor Value             int64
dtype: object

Answer 2

删除names=headers部分，因为这会让pandas感到困惑。 pandas假定第一行是默认的标题行。通过指定标题名称，它假定第一行必须是数据，因此您的错误是单词Date与您的格式不匹配（它没有＆＃t; t）

编辑：只需将read_csv行更改为：

df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV')

因此，您不必再指定headers，以便删除该行。

EDIT2：

问题是您的日期和时间字段是依赖的。创建一个名为DateTime的新列，它将两者结合起来，然后将striptime函数应用于该列。

替换我告诉您删除的行，df['Date'] = df['Date'].map一行

df['DateTime'] = df['Date'] + " " + df['Time']
df['DateTime'] = df['DateTime'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))

Answer 3

在解析大型csv文件中的日期时遇到了类似的问题。在我的情况下，我在csv中有一些坏的行触发了错误，所以我只是从数据帧中删除它们，以便稍后解析日期。

如果您不介意丢失此信息，可以执行以下操作：

df = df[df['Date'].str.contains(r'^\d{4}-\d{2}-\d{2} \d{2}\:\d{2}:\d{2}.\d{3}')]

timer = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')       

df['Date'] =  df['Date'].apply(timer)

Python ValueError：时间数据＆＃39;日期＆＃39;不符合格式＆＃39;％Y /％m /％d％H：％M：％S％f＆＃39;

3 个答案: