解决方案更新:,通过上面提供的链接,这是我想到的:
import pandas as pd
import numpy as np
df = pd.read_csv('Book1.csv')
df = df.set_index(pd.DatetimeIndex(df['Duration']))
idx = pd.DatetimeIndex(df['Duration'])
df['Duration_Decimal'] = idx.hour + idx.minute / 60
文件开始:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
from datetime import datetime
df = pd.read_excel('Book1.xlsx', sheet_name='Sheet1')
这是我要转换的列:
In: df.Duration.head()
Out: 0 01:30:00
1 00:00:00
2 00:30:00
3 00:30:00
4 00:00:00
Name: Duration, dtype: object
我做的功能:
def conversion_function(t):
(h, m, s) = t.split(':')
return int(h) + int(m)/60 + int(s)
测试功能:
In: conversion_function('01:30:00')
Out: 1.5
将新列(Duration_2)插入数据框并执行conversion_function(带有AttributeError):
df['Duration_2'] = df['Duration'].apply(conversion_function)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-ad23f30d9b5a> in <module>()
----> 1 df['Duration_2'] = df['Duration'].apply(conversion_function)
D:\Python\lib\site-packages\pandas\core\series.py in apply(self, func,
convert_dtype, args, **kwds)
3190 else:
3191 values = self.astype(object).values
-> 3192 mapped = lib.map_infer(values, f,
convert=convert_dtype)
3193
3194 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-3-d662e6fcae47> in conversion_function(t)
1 def conversion_function(t):
----> 2 (h, m, s) = t.split(':')
3 return int(h) + int(m)/60 + int(s)
AttributeError: 'datetime.time' object has no attribute 'split'
当我单独使用该函数时,该函数可以工作,但是无论我尝试调整它的方式有多少,我似乎都可以在数据框中使用它。
答案 0 :(得分:0)
您的数据似乎已经采用日期时间格式。不过,您的conversion_function
希望可以使用字符串,这就是为什么会出现错误的原因(split()
适用于字符串)。
由于您正在使用Pandas,因此建议您使用内置的Pandas日期操作方法:
data = ["01:30:00", "00:00:00", "00:30:00", "00:30:00", "00:00:00"]
time_data = pd.to_datetime(data)
time_data.hour + time_data.minute / 60
# Float64Index([1.5, 0.0, 0.5, 0.5, 0.0], dtype='float64')
注意:收到的错误表明您拥有datetime.time
格式的时间数据-您也可以只使用datetime.time
中的相同方法,而不是转换为特定于Pandas的datetime对象:
# match OP's exact time format
time_data = [datetime.datetime.strptime(x, "%H:%M:%S").time() for x in data]
[x.hour + x.minute/60 for x in time_data]
# [1.5, 0.0, 0.5, 0.5, 0.0]