熊猫:找出两个时间列之间的差异(持续时间)

时间:2021-06-17 02:16:29

标签: python pandas time

我有这个数据框:

          Start_date         End_date           hour1     hour2      
0   2018-01-31 12:00:00 2019-03-17 21:45:00  12:00:00   21:45:00
1   2018-02-28 12:00:00 2019-03-24 21:45:00  12:00:00   21:45:00

我正在尝试仅基于我的列(小时 2 和小时 1)创建一个具有持续时间的新列(需要输出为以秒为单位的数值)

我已经使用此代码创建了我的小时列。也许错误就在这里。

date_df['hour1'] = date_df['Start_date'].dt.time
date_df['hour2'] = date_df['End_date'].dt.time
date_df

我尝试了这个解决方案:

    date_df['hour2'] = pd.to_datetime(date_df['hour2'])
    date_df['hour1'] = pd.to_datetime(date_df['hour1'])
    date_df['NewColumn2']=date_df['hour2']-date_df['hour1'] 

错误:

        ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-283-b75adc651706> in <module>
    ----> 1 date_df['hour2'] = pd.to_datetime(date_df['hour2'])
          2 date_df['hour1'] = pd.to_datetime(date_df['hour1'])
          3 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
    
    ~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
        801             result = arg.map(cache_array)
        802         else:
    --> 803             values = convert_listlike(arg._values, format)
        804             result = arg._constructor(values, index=arg.index, name=arg.name)
        805     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
    
    ~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
        457         assert format is None or infer_datetime_format
        458         utc = tz == "utc"
    --> 459         result, tz_parsed = objects_to_datetime64ns(
        460             arg,
        461             dayfirst=dayfirst,
    
    ~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
       2042 
       2043     try:
    -> 2044         result, tz_parsed = tslib.array_to_datetime(
       2045             data,
       2046             errors=errors,
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    TypeError: <class 'datetime.time'> is not convertible to datetime

我也试过这个解决方案:

    date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

我收到此错误消息:

    TypeError                                 Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, a, b, use_numexpr)
    232         if use_numexpr:
--> 233             return _evaluate(op, op_str, a, b)  # type: ignore
    234     return _evaluate_standard(op, op_str, a, b)

~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b)
    118     if result is None:
--> 119         result = _evaluate_standard(op, op_str, a, b)
    120 

~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b)
     67     with np.errstate(all="ignore"):
---> 68         return op(a, b)
     69 

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-286-bf4c33189e88> in <module>
----> 1 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

~\Anaconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~\Anaconda3\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    188     else:
    189         with np.errstate(all="ignore"):
--> 190             res_values = na_arithmetic_op(lvalues, rvalues, op)
    191 
    192     return res_values

~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    148             #  will handle complex numbers incorrectly, see GH#32047
    149             raise
--> 150         result = masked_arith_op(left, right, op)
    151 
    152     if is_cmp and (is_scalar(result) or result is NotImplemented):

~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in masked_arith_op(x, y, op)
     90         if mask.any():
     91             with np.errstate(all="ignore"):
---> 92                 result[mask] = op(xrav[mask], yrav[mask])
     93 
     94     else:

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

当我使用下面给出的建议初始加载数据框时,错误实际上不复存在。但问题是同样的错误正在影响我的原始数据框(真正的练习),所以我需要了解我做错了什么或者我应该改变什么来解决这个问题。

我应该如何更改代码?

谢谢

2 个答案:

答案 0 :(得分:0)

我在我的电脑上运行你的代码。它没有出错。 您的数据框值不是 str

它的类型已经是日期时间。您的错误消息说明了该信息。

TypeError: <class 'datetime.time'> is not convertible to datetime

先运行 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

然后应该检查值的类型。

以下是我运行 PC 的代码。

date_df = pd.DataFrame(
    {
        "Start_date": ["2018-01-31 12:00:00", "2018-02-28 12:00:00"],
        "End_date": ["2019-03-17 21:45:00", "2019-03-24 21:45:00"],
        "hour1": ["12:00:00", "12:00:00"],
        "hour2": ["21:45:00", "21:45:00"],
    }
)
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1'] 

好的,现在我明白你做了什么。 您必须首先检查您的值的类型。 非常重要。

我认为您的 'Start_date''End_date' 已经是 datetime.datetime 对象。

your_date_df['NewColumn2'] = your_date_df['End_date'] - your_date_df['Start_date']

如果您只想显示时差。 做这个。 一、导入日期时间

import datetime
        
your_date_df['NewColumn2_onlyTime'] = your_date_df['NewColumn2'].apply(
    lambda x: (datetime.datetime.min + x).time())

print(your_date_df)

index   Start_date  End_date    hour1   hour2   NewColumn2  NewColumn2_onlyTime
0   2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00    21:45:00    410 days 09:45:00   09:45:00
1   2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00    21:45:00    389 days 09:45:00   09:45:00

答案 1 :(得分:0)

如果您想要从头到尾的差异,则不需要获取小时数。你可以这样做

data='''
          Start_date         End_date           hour1     hour2
0   2018-01-31 12:00:00  2019-03-17 21:45:00  12:00:00   21:45:00
1   2018-02-28 12:00:00  2019-03-24 21:45:00  12:00:00   21:45:00'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['deltadays_seconds'] = (df.End_date-df.Start_date).dt.total_seconds()
df

           Start_date            End_date     hour1     hour2  deltadays_seconds
0 2018-01-31 12:00:00 2019-03-17 21:45:00  12:00:00  21:45:00         35459100.0
1 2018-02-28 12:00:00 2019-03-24 21:45:00  12:00:00  21:45:00         33644700.0

您可以转入小时 1 和小时 2,但您会得到相同的答案。小时 1 和小时 2 只是总日期和时间的表示。

相关问题