如何获取日期以及执行的功能?

时间:2018-11-30 00:46:36

标签: pandas python-2.7

我的初始数据框是这样的:

import pandas as pd
df = pd.DataFrame({'serialNo':['aaaa','aaaa','cccc','ffff'],
               'Date':['2018-09-15','2018-09-16','2018-09-15','2018-09-19'],
               'moduleLocation':   ['face','head','stomach','legs'],
               'moduleName':   ['singing', 'dance','booze', 'vocals'],
               'warning': [4402, 3747 ,5555,8754],
               'failed':[0,3462,5161,3262]})

我已经执行了以下功能来清理数据,首先是将所有数据类型设置为字符串:

all_columns = list(df)
df[all_columns] = df[all_columns].astype(str)

后面是执行某些串联的功能:

def concatenate(diagnostics, field, target):
    diagnostics.sort_values(by=['serialNo',field],inplace=True)
    diagnostics.drop_duplicates(inplace=True)
    diagnostics[target] = \
    diagnostics.groupby(['serialNo'], as_index=False)[field].transform(lambda s: ','.join(filter(None, s)))
    diagnostics.drop([field],axis=1,inplace=True)
    diagnostics.drop_duplicates(inplace=True)
    return diagnostics

module = concatenate(df[['serialNo','moduleName']], 'moduleName', 'Module')
Warn = concatenate(df[['serialNo','warning']], 'warning', 'Warn')
Err = concatenate(df[['serialNo','failed']], 'failed', 'Err')
Location = concatenate(df[['serialNo','moduleLocation']], 'moduleLocation', 'Location')

diag_final = pd.merge(module,Warn,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Err,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Location,on=['serialNo'],how='inner')

现在的问题是,diag_final数据框中的日期列不再存在,我希望拥有它。我不想更改现有功能,而只是确保我具有相应的日期。我应该如何实现?

1 个答案:

答案 0 :(得分:0)

每个序列号可能有多个值。因此,您将必须连接值,类似于对moduleLocation和moduleName所做的操作。

dates = concatenate(df[['serialNo','Date']], 'Date', 'Date_cat')
diag_final = pd.merge(diag_final,dates,on=['serialNo'],how='inner')