再次感谢您的耐心,我不是最好的沟通者。如果有任何其他信息我应该添加,请告诉我。
我目前的数据如下:
"Identifier","Status","OPENED","Resolv","closed_on","duplicate_on","junked_on","unproducible_on","verified_on"
"xx1","D","2004-07-28","","","2004-08-26","","",""
"xx2","N","2010-03-02","","","","","",""
"xx3","U","2005-10-26","","","","","2005-11-01",""
"xx4","V","2006-06-30","2006-09-15","","","","","2006-11-20"
"xx5","R","2012-09-21","2013-06-06","","","","",""
"xx6","D","2009-11-25","","","2010-02-26","","",""
"xx7","D","2003-08-29","","","2003-08-29","","",""
"xx8","R","2003-06-06","2003-06-24","","","","",""
"xx9","R","2004-11-05","2004-11-15","","","","",""
"xx10","R","2008-02-21","2008-09-25","","","","",""
"xx11","R","2007-03-08","2007-03-21","","","","",""
"xx12","R","2011-08-22","2012-06-21","","","","",""
"xx13","J","2003-07-07","","","","2003-07-10","",""
"xx14","A","2008-09-24","","","","","",""
我正在尝试使用下面的代码添加年龄计算列,以便数据看起来像(注意第一个值正在返回""对于年龄,这是我想要解决的问题我的问题。如果状态没有日期,那么我想使用今天的日期。):
"Identifier","Status","OPENED","Resolv","closed_on","duplicate_on","junked_on","unproducible_on","verified_on","Age"
"xx1","J","2002-02-07","","","","","","",""
"xx2","J","2008-11-25","","","","2008-12-04","","",9.0
"xx3","C","2002-01-27","","2002-03-19","","","","",51.0
"xx4","V","2003-07-09","2003-07-10","","","","","2003-07-15",6.0
"xx5","D","2008-06-30","","","2008-09-09","","","",71.0
"xx6","R","2010-06-02","2010-06-11","","","","","",9.0
"xx7","R","2006-11-16","2006-12-12","","","","","",26.0
"xx8","R","2006-03-29","2006-03-31","","","","","",2.0
"xx9","R","2010-09-07","2010-10-05","","","","","",28.0
"xx10","U","2006-03-09","","","","","2006-06-20","",103.0
"xx11","R","2007-04-26","2007-05-01","","","","","",5.0
"xx12","C","2010-03-07","","2010-03-11","","","","",4.0
"xx13","R","2009-12-22","2010-05-31","","","","","",160.0
"xx14","R","2006-06-24","2006-06-28","","","","","",4.0
但是,当缺陷状态更改日期缺失时,年龄函数会返回''如下图所示。所有102个空白单元都是如此。
from datetime import datetime as dtt
import pandas as pd
import numpy as np
import csv
年龄列计算功能
def defect_age(df):
"""Performs age calc and creates age col"""
today = dtt.today()
终端状态列表:
terminal = ['R', 'V', 'D', 'J', 'U', 'C']
每个状态的日期到期时间
resolved = pd.to_datetime(df.Resolv, errors='coerce')
closed = pd.to_datetime(df.closed_on, errors='coerce')
duplicate = pd.to_datetime(df.duplicate_on, errors='coerce')
junked = pd.to_datetime(df.junked_on, errors='coerce')
unproducible = pd.to_datetime(df.unproducible_on, errors='coerce')
verified = pd.to_datetime(df.verified_on, errors='coerce')
submitted = pd.to_datetime(df.OPENED, errors='coerce')
按状态计算日期
r = (resolved - submitted) / np.timedelta64(1, 'D', errors='coerce')
c = (closed - submitted) / np.timedelta64(1, 'D', errors='coerce')
d = (duplicate - submitted) / np.timedelta64(1, 'D', errors='coerce')
j = (junked - submitted) / np.timedelta64(1, 'D', errors='coerce')
u = (unproducible - submitted) / np.timedelta64(1, 'D', errors='coerce')
v = (verified - submitted) / np.timedelta64(1, 'D', errors='coerce')
# not terminal state
s = (today - submitted) / np.timedelta64(1, 'D', errors='coerce')
date_calc = int(s)
我正在尝试填充年龄栏。如果状态为终端且日期不为空,请使用上述日期计算。由于某些原因,当终端状态为空时,它没有使用我正在尝试的else子句。
if df.Status in terminal:
if df.Status == 'R' and df.Resolv != '':
return r
elif df.Status == 'C' and df.closed_on != '':
return c
elif df.Status == 'D' and df.duplicate_on != '':
return d
elif df.Status == 'J' and df.junked_on != '':
return j
elif df.Status == 'U' and df.unproducible_on != '':
return u
elif df.Status == 'V' and df.verified_on != '':
return v
else:
return date_calc
读入数据
df = pd.read_csv('BigData.txt', low_memory=False)
使用defect_age函数
创建新列df['Age'] = df.apply(lambda row: defect_age(row), axis=1)
将结果写入CSV
df.to_csv("data.csv", index=False, sep=',', quoting=csv.QUOTE_NONNUMERIC)
ROW 2511:
Identifier Status OPENED Resolv closed_on duplicate_on junked_on \
2511 xxxx5 J 2002-02-07 NaN NaN NaN NaN
unproducible_on verified_on
2511 NaN NaN
答案 0 :(得分:1)
我制作了一个快速代码,基本上使用状态来获取年龄,如果状态不在终端中,它将默认为今天的日期。
def toDateTime(s): return dtt.strptime(s, '%Y-%m-%d')
def defect_age(row):
status_dict = {'R': 'Resolv', 'V': 'verified_on',
'D': 'duplicate_on', 'J': 'junked_on',
'U': 'unproducible_on', 'C': 'closed_on'}
submitted = toDateTime(row['OPENED'])
status = row['Status']
if status in status_dict:
date_from_col = row[status_dict[status]]
date = toDateTime(date_from_col) if date_from_col != '' else dtt.today()
else:
date = dtt.today()
return (date - submitted).days
此功能相当于上面的defect_age函数。现在,您可以将此函数应用于数据框
df.fillna('', inplace=True)
df['Age'] = df.apply(defect_age, axis=1)