我正在从Dataframe解析日期,其中数据来自CSV文件。我得到了上述错误。我确信格式是正确的。
我的代码:
import pandas as pd
from datetime import datetime
import csv
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
headers = ['Sensor Value','Date','Time']
df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV',names=headers)
print (df)
df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
x = df['Date']
y = df['Sensor Value']
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
我的DataFrame:
0 Sensor Value Date Time
1 2 2017/02/17 19:06:17.188
2 72 2017/02/17 19:06:22.360
3 72 2017/02/17 19:06:27.348
控制台错误:
new_values = map_f(values, arg)
File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
File "C:/Users/Lala Rushan/PycharmProjects/newgraph/newgraph.py", line 10, in <lambda>
df['Date'] = df['Date'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
File "C:\Users\Lala Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 500, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "C:\Users\Lala Rushan\AppData\Local\Programs\Python\Python35\lib\_strptime.py", line 337, in _strptime
(data_string, format))
ValueError: time data 'Date' does not match format '%Y/%m/%d %H:%M:%S.%f'
CSV输入:
Sensor Value Date Time
2 2017/02/17 19:06:17.188
72 2017/02/17 19:06:22.360
72 2017/02/17 19:06:27.348
72 2017/02/17 19:06:32.482
74 2017/02/17 19:06:37.515
70 2017/02/17 19:06:42.580
答案 0 :(得分:0)
对于errors='coerce'
解析有问题的值,我认为您需要to_datetime
参数NaN
:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
检查有问题的行:
print (df[pd.to_datetime(df['Date'], errors='coerce').isnull()])
但是如果需要阅读date
列和datetime
的时间,请使用read_csv
中的参数parse_dates
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/17,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
Datetime Sensor Value
0 2017-02-17 19:06:17.188 2
1 2017-02-17 19:06:22.360 72
2 2017-02-17 19:06:27.348 72
3 2017-02-17 19:06:32.482 72
4 2017-02-17 19:06:37.515 74
5 2017-02-17 19:06:42.580 70
print (df.dtypes)
Datetime datetime64[ns]
Sensor Value int64
dtype: object
第一个解决方案与第二个解决方案相结合,其中最后一行是30.2.2017
不存在的内容:
temp=u"""
2,2017/02/17,19:06:17.188
72,2017/02/17,19:06:22.360
72,2017/02/17,19:06:27.348
72,2017/02/17,19:06:32.482
74,2017/02/17,19:06:37.515
70,2017/02/30,19:06:42.580"""
#after testing replace 'StringIO(temp)' to 'C:/Users\Lala Rushan\Downloads\DataLog.CSV'
headers = ['Sensor Value','Date','Time']
df = pd.read_csv(StringIO(temp), names=headers, parse_dates={'Datetime':['Date','Time']})
print (df)
Datetime Sensor Value
0 2017/02/17 19:06:17.188 2
1 2017/02/17 19:06:22.360 72
2 2017/02/17 19:06:27.348 72
3 2017/02/17 19:06:32.482 72
4 2017/02/17 19:06:37.515 74
5 2017/02/30 19:06:42.580 70
df['Datetime'] = pd.to_datetime(df['Datetime'], errors='coerce')
print (df)
Datetime Sensor Value
0 2017-02-17 19:06:17.188 2
1 2017-02-17 19:06:22.360 72
2 2017-02-17 19:06:27.348 72
3 2017-02-17 19:06:32.482 72
4 2017-02-17 19:06:37.515 74
5 NaT 70 <- replace 30.2.2017 to NaT (same as NaN for dates)
print (df.dtypes)
Datetime datetime64[ns]
Sensor Value int64
dtype: object
答案 1 :(得分:0)
删除names=headers
部分,因为这会让pandas
感到困惑。 pandas
假定第一行是默认的标题行。通过指定标题名称,它假定第一行必须是数据,因此您的错误是单词Date与您的格式不匹配(它没有&#t; t)
编辑:只需将read_csv
行更改为:
df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV')
因此,您不必再指定headers
,以便删除该行。
EDIT2:
问题是您的日期和时间字段是依赖的。创建一个名为DateTime的新列,它将两者结合起来,然后将striptime
函数应用于该列。
替换我告诉您删除的行,df['Date'] = df['Date'].map
一行
df['DateTime'] = df['Date'] + " " + df['Time']
df['DateTime'] = df['DateTime'].map(lambda x: datetime.strptime(str(x), '%Y/%m/%d %H:%M:%S.%f'))
答案 2 :(得分:0)
在解析大型csv文件中的日期时遇到了类似的问题。在我的情况下,我在csv中有一些坏的行触发了错误,所以我只是从数据帧中删除它们,以便稍后解析日期。
如果您不介意丢失此信息,可以执行以下操作:
df = df[df['Date'].str.contains(r'^\d{4}-\d{2}-\d{2} \d{2}\:\d{2}:\d{2}.\d{3}')]
timer = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f')
df['Date'] = df['Date'].apply(timer)