我有这个数据框:
Start_date End_date hour1 hour2
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00
我正在尝试仅基于我的列(小时 2 和小时 1)创建一个具有持续时间的新列(需要输出为以秒为单位的数值)
我已经使用此代码创建了我的小时列。也许错误就在这里。
date_df['hour1'] = date_df['Start_date'].dt.time
date_df['hour2'] = date_df['End_date'].dt.time
date_df
我尝试了这个解决方案:
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-283-b75adc651706> in <module>
----> 1 date_df['hour2'] = pd.to_datetime(date_df['hour2'])
2 date_df['hour1'] = pd.to_datetime(date_df['hour1'])
3 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
801 result = arg.map(cache_array)
802 else:
--> 803 values = convert_listlike(arg._values, format)
804 result = arg._constructor(values, index=arg.index, name=arg.name)
805 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
457 assert format is None or infer_datetime_format
458 utc = tz == "utc"
--> 459 result, tz_parsed = objects_to_datetime64ns(
460 arg,
461 dayfirst=dayfirst,
~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2042
2043 try:
-> 2044 result, tz_parsed = tslib.array_to_datetime(
2045 data,
2046 errors=errors,
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
TypeError: <class 'datetime.time'> is not convertible to datetime
我也试过这个解决方案:
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
我收到此错误消息:
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
142 try:
--> 143 result = expressions.evaluate(op, left, right)
144 except TypeError:
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, a, b, use_numexpr)
232 if use_numexpr:
--> 233 return _evaluate(op, op_str, a, b) # type: ignore
234 return _evaluate_standard(op, op_str, a, b)
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b)
118 if result is None:
--> 119 result = _evaluate_standard(op, op_str, a, b)
120
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b)
67 with np.errstate(all="ignore"):
---> 68 return op(a, b)
69
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-286-bf4c33189e88> in <module>
----> 1 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~\Anaconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self, other)
66
67 return new_method
~\Anaconda3\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
341 lvalues = extract_array(left, extract_numpy=True)
342 rvalues = extract_array(right, extract_numpy=True)
--> 343 result = arithmetic_op(lvalues, rvalues, op)
344
345 return left._construct_result(result, name=res_name)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
188 else:
189 with np.errstate(all="ignore"):
--> 190 res_values = na_arithmetic_op(lvalues, rvalues, op)
191
192 return res_values
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
148 # will handle complex numbers incorrectly, see GH#32047
149 raise
--> 150 result = masked_arith_op(left, right, op)
151
152 if is_cmp and (is_scalar(result) or result is NotImplemented):
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in masked_arith_op(x, y, op)
90 if mask.any():
91 with np.errstate(all="ignore"):
---> 92 result[mask] = op(xrav[mask], yrav[mask])
93
94 else:
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
当我使用下面给出的建议初始加载数据框时,错误实际上不复存在。但问题是同样的错误正在影响我的原始数据框(真正的练习),所以我需要了解我做错了什么或者我应该改变什么来解决这个问题。
我应该如何更改代码?
谢谢
答案 0 :(得分:0)
我在我的电脑上运行你的代码。它没有出错。
您的数据框值不是 str
。
它的类型已经是日期时间。您的错误消息说明了该信息。
TypeError: <class 'datetime.time'> is not convertible to datetime
先运行 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
然后应该检查值的类型。
以下是我运行 PC 的代码。
date_df = pd.DataFrame(
{
"Start_date": ["2018-01-31 12:00:00", "2018-02-28 12:00:00"],
"End_date": ["2019-03-17 21:45:00", "2019-03-24 21:45:00"],
"hour1": ["12:00:00", "12:00:00"],
"hour2": ["21:45:00", "21:45:00"],
}
)
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
好的,现在我明白你做了什么。 您必须首先检查您的值的类型。 非常重要。
我认为您的 'Start_date'
和 'End_date'
已经是 datetime.datetime
对象。
your_date_df['NewColumn2'] = your_date_df['End_date'] - your_date_df['Start_date']
如果您只想显示时差。 做这个。 一、导入日期时间
import datetime
your_date_df['NewColumn2_onlyTime'] = your_date_df['NewColumn2'].apply(
lambda x: (datetime.datetime.min + x).time())
print(your_date_df)
index Start_date End_date hour1 hour2 NewColumn2 NewColumn2_onlyTime
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00 410 days 09:45:00 09:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00 389 days 09:45:00 09:45:00
答案 1 :(得分:0)
如果您想要从头到尾的差异,则不需要获取小时数。你可以这样做
data='''
Start_date End_date hour1 hour2
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['deltadays_seconds'] = (df.End_date-df.Start_date).dt.total_seconds()
df
Start_date End_date hour1 hour2 deltadays_seconds
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00 35459100.0
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00 33644700.0
您可以转入小时 1 和小时 2,但您会得到相同的答案。小时 1 和小时 2 只是总日期和时间的表示。