如何修复OverflowError:int64加法中的溢出

时间:2019-06-23 03:29:28

标签: python data-science

我正在尝试从列df['date_of_admission']中减去列df['DOB'],以找出两者之间的差值,并将年龄值存储在df['age']列中,但是,出现此错误:

  

OverflowError:int64加法中的溢出

 DOB          date_of_admission      age
 2000-05-07   2019-01-19 12:26:00        
 1965-01-30   2019-03-21 02:23:12        
 NaT          2018-11-02 18:30:10        
 1981-05-01   2019-05-08 12:26:00       
 1957-01-10   2018-12-31 04:01:15         
 1968-07-14   2019-01-28 15:05:09            
 NaT          2018-04-13 06:20:01 
 NaT          2019-02-15 01:01:57 
 2001-02-10   2019-03-21 08:22:00       
 1990-03-29   2018-11-29 03:05:03
.....         ......
.....         .....
.....         .....

我已经尝试了以下方法:

import numpy as np
import pandas as pd
from datetime import dt

df['age'] = (df['date_of_admission'] - df['DOB']).dt.days // 365

在找到两者之间的差异后,预计将获得以下年龄列:

age
26
69
NaN
58
.
.
.

3 个答案:

答案 0 :(得分:3)

OP很可能使用医疗MIMIC数据集,其中日期被加乱以保护患者的身份。具体来说,对于89岁以上的患者,they shifted the date of birth by 300 years

这样的较长时间跨度在使用熊猫timedelta时会溢出:

pd.to_timedelta(300, unit="Y", box=False)
> numpy.timedelta64(-8979658473709551616,'ns')

在数据框操作中发生这种情况时,您会遇到错误。根据@tawab_shakeel的答案改编而成:

df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','1700-01-01'],
                   "date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2000-01-01 02:23:23"]})

df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

# Gives AttributeError: Can only use .dt accessor with datetimelike values
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

# Gives OverflowError: long too big to convert
pd.to_timedelta(df['date_of_admission']-df['DOB'])

任何转换为​​timedelta64[ns]数据类型的计算都会出现此问题。

作为一种解决方法,您可以改用apply操作,直接计算每个元素的年龄元素:

df['age'] = df.apply(lambda e: (e['date_of_admission'] - e['DOB']).days/365, axis=1)

答案 1 :(得分:1)

1)。您做得正确,但是textLabel包含唯一的日期,而text包含日期和时间。操纵override func tableView(_ tableView: UITableView, cellForRowAt indexPath: IndexPath) -> UITableViewCell { let cell = tableView.dequeueReusableCell(withIdentifier: "cellTypeIdentifier", for: indexPath) // Configure the cell’s contents. // I'm assuming you have a Realm object called `Idea` // And `ideas` is an array of those `Idea` objects let idea = ideas[indexPath.row] cell.textLabel?.text = idea.name return cell } ,使其仅包含日期,然后您将得到结果。

2)。在这里,我向您的代码中添加了DOB,以便您获得结果。

date_of_admission

希望它能对您有所帮助。

答案 2 :(得分:1)

将两列都转换为日期,然后减去

import pandas as pd


df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date

df['DOB'] = pd.to_datetime(df['DOB']).dt.date

df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

第二次测试

#Now I have use DOB AND date_of_admission data from the question and it is working fine

df = pd.DataFrame(data={"DOB":['2000-05-07','1965-01-30','NaT'],
                   "date_of_admission":["2019-01-19 12:26:00","2019-03-21 02:23:12", "2018-11-02 18:30:10"]})

df['DOB'] = pd.to_datetime(df['DOB']).dt.date
df['date_of_admission'] = pd.to_datetime(df['date_of_admission']).dt.date
df['age'] = ((df['date_of_admission']-df['DOB']).dt.days) //365

结果:

DOB       date_of_admission   age
2000-05-07  2019-01-19       18.0
1965-01-30  2019-03-21       54.0
NaT         2018-11-02       NaN