为什么我从条件语句中获取空值?

时间:2016-08-15 19:47:20

标签: python pandas conditional

再次感谢您的耐心,我不是最好的沟通者。如果有任何其他信息我应该添加,请告诉我。

我目前的数据如下:

"Identifier","Status","OPENED","Resolv","closed_on","duplicate_on","junked_on","unproducible_on","verified_on"
"xx1","D","2004-07-28","","","2004-08-26","","",""
"xx2","N","2010-03-02","","","","","",""
"xx3","U","2005-10-26","","","","","2005-11-01",""
"xx4","V","2006-06-30","2006-09-15","","","","","2006-11-20"
"xx5","R","2012-09-21","2013-06-06","","","","",""
"xx6","D","2009-11-25","","","2010-02-26","","",""
"xx7","D","2003-08-29","","","2003-08-29","","",""
"xx8","R","2003-06-06","2003-06-24","","","","",""
"xx9","R","2004-11-05","2004-11-15","","","","",""
"xx10","R","2008-02-21","2008-09-25","","","","",""
"xx11","R","2007-03-08","2007-03-21","","","","",""
"xx12","R","2011-08-22","2012-06-21","","","","",""
"xx13","J","2003-07-07","","","","2003-07-10","",""
"xx14","A","2008-09-24","","","","","",""

我正在尝试使用下面的代码添加年龄计算列,以便数据看起来像(注意第一个值正在返回""对于年龄,这是我想要解决的问题我的问题。如果状态没有日期,那么我想使用今天的日期。):

"Identifier","Status","OPENED","Resolv","closed_on","duplicate_on","junked_on","unproducible_on","verified_on","Age"
"xx1","J","2002-02-07","","","","","","",""
"xx2","J","2008-11-25","","","","2008-12-04","","",9.0
"xx3","C","2002-01-27","","2002-03-19","","","","",51.0
"xx4","V","2003-07-09","2003-07-10","","","","","2003-07-15",6.0
"xx5","D","2008-06-30","","","2008-09-09","","","",71.0
"xx6","R","2010-06-02","2010-06-11","","","","","",9.0
"xx7","R","2006-11-16","2006-12-12","","","","","",26.0
"xx8","R","2006-03-29","2006-03-31","","","","","",2.0
"xx9","R","2010-09-07","2010-10-05","","","","","",28.0
"xx10","U","2006-03-09","","","","","2006-06-20","",103.0
"xx11","R","2007-04-26","2007-05-01","","","","","",5.0
"xx12","C","2010-03-07","","2010-03-11","","","","",4.0
"xx13","R","2009-12-22","2010-05-31","","","","","",160.0
"xx14","R","2006-06-24","2006-06-28","","","","","",4.0

但是,当缺陷状态更改日期缺失时,年龄函数会返回''如下图所示。所有102个空白单元都是如此。

Missing data example

from datetime import datetime as dtt
import pandas as pd
import numpy as np
import csv

年龄列计算功能

def defect_age(df):
    """Performs age calc and creates age col"""
    today = dtt.today()

终端状态列表:

    terminal = ['R', 'V', 'D', 'J', 'U', 'C']

每个状态的日期到期时间

    resolved = pd.to_datetime(df.Resolv, errors='coerce')
    closed = pd.to_datetime(df.closed_on, errors='coerce')
    duplicate = pd.to_datetime(df.duplicate_on, errors='coerce')
    junked = pd.to_datetime(df.junked_on, errors='coerce')
    unproducible = pd.to_datetime(df.unproducible_on, errors='coerce')
    verified = pd.to_datetime(df.verified_on, errors='coerce')
    submitted = pd.to_datetime(df.OPENED, errors='coerce')

按状态计算日期

    r = (resolved - submitted) / np.timedelta64(1, 'D', errors='coerce')
    c = (closed - submitted) / np.timedelta64(1, 'D', errors='coerce')
    d = (duplicate - submitted) / np.timedelta64(1, 'D', errors='coerce')
    j = (junked - submitted) / np.timedelta64(1, 'D', errors='coerce')
    u = (unproducible - submitted) / np.timedelta64(1, 'D', errors='coerce')
    v = (verified - submitted) / np.timedelta64(1, 'D', errors='coerce')
    # not terminal state
    s = (today - submitted) / np.timedelta64(1, 'D', errors='coerce')
    date_calc = int(s)

我正在尝试填充年龄栏。如果状态为终端且日期不为空,请使用上述日期计算。由于某些原因,当终端状态为空时,它没有使用我正在尝试的else子句。

    if df.Status in terminal:
        if df.Status == 'R' and df.Resolv != '':
            return r
        elif df.Status == 'C' and df.closed_on != '':
            return c
        elif df.Status == 'D' and df.duplicate_on != '':
            return d
        elif df.Status == 'J' and df.junked_on != '':
            return j
        elif df.Status == 'U' and df.unproducible_on != '':
            return u
        elif df.Status == 'V' and df.verified_on != '':
            return v
    else:
        return date_calc

读入数据

df = pd.read_csv('BigData.txt', low_memory=False)

使用defect_age函数

创建新列
df['Age'] = df.apply(lambda row: defect_age(row), axis=1)

将结果写入CSV

df.to_csv("data.csv", index=False, sep=',', quoting=csv.QUOTE_NONNUMERIC)

ROW 2511:

      Identifier Status      OPENED Resolv closed_on duplicate_on junked_on  \
2511  xxxx5           J  2002-02-07    NaN       NaN          NaN       NaN   

     unproducible_on verified_on  
2511             NaN         NaN  

1 个答案:

答案 0 :(得分:1)

我制作了一个快速代码,基本上使用状态来获取年龄,如果状态不在终端中,它将默认为今天的日期。

def toDateTime(s): return dtt.strptime(s, '%Y-%m-%d')

def defect_age(row):
    status_dict = {'R': 'Resolv', 'V': 'verified_on',
           'D': 'duplicate_on', 'J': 'junked_on',
           'U': 'unproducible_on', 'C': 'closed_on'}

    submitted = toDateTime(row['OPENED'])
    status = row['Status']

    if status in status_dict:
        date_from_col = row[status_dict[status]]
        date = toDateTime(date_from_col) if date_from_col != '' else dtt.today()
    else:
        date = dtt.today()

    return (date - submitted).days

此功能相当于上面的defect_age函数。现在,您可以将此函数应用于数据框

df.fillna('', inplace=True)
df['Age'] = df.apply(defect_age, axis=1)