我有一系列功能可以清理数据框中的数据(填充假数据)。这些函数通常适用于各个列。有时列不存在。 我想创建一个包含所有子功能的函数,但显然每个子函数只有在列存在时才有效。
输入
structure(list(HospitalNumber = c(" J6044658\n", " Y6417773\n"
), PatientName = c(" Jargon, Victoria\n", " Powell, Destiny\n"
), DOB = c(" 1943-10-13\n", " 1946-12-29\n"), GeneralPractitioner = c(" Dr\n Martin, Marche\n",
" Dr\n al-Safi, Lutfiyya\n"), Dateofprocedure = structure(c(14559,
14045), class = "Date"), ClinicalDetails = c(" Ongoing antral gastritis despite treatment with PPI,Reflux sx\n,High dyshagia OGD - fundic gastritis\n,Chronic diarrhoea/colonic biopsies,Currently on steriod for IgG4 disease\n,Food bolus obstruction\n\n4 specimen\n Nature of specimen: Nature of specimen as stated on pot = 'proximal body lesser curve polyps x4 ',Specimen A- Nature of specimen as stated on request form = 'GREATER CURVE ',Nature of specimen as stated on request form = 'Gastric polyp '\n",
" Quadrantic biopsies were taken at\n,OGD - only 3cm sliding hiatus\n\n7 specimen\n Nature of specimen: Nature of specimen as stated on pot = 'RECTAL POLYPS X3 ',Nature of specimen as stated on pot = 'fundus polyps x4 ',Nature of specimen as stated on request form = 'DUODENAL BX ',Nature of specimen as stated on pot = 'Papilloma at 36 cm oesophagus ',a) Nature of specimen as stated on request form = 'D2 bx x 2' ,Nature of specimen as stated on pot = 'Oesophagus 26 cm '\n"
), Macroscopicdescription = c(" 3 specimens collected the largest measuring 3 x 2 x 1 mm and the smallest 2 x 1 x 5 mm\n",
" 4 specimens collected the largest measuring 4 x 4 x 4 mm and the smallest 5 x 3 x 1 mm\n"
), Histology = c(" Two biopsies consist of small bowel mucosa and are within normal histological limits\n\n",
" modified giemsa stain\n,These are biopsies of gastric mucosa ,There is no evidence of coeliac disease\n,The nuclei are hyperchromatic,\n,There is no granulomatous inflammation\n,The appearances are in keeping with a reactive/chemical gastritis,features including basal layer hyperplasia and reactive nucelar changes with underlying\n,These are two biopsies of squamous epithelium within normal limits,fibromuscularisation of the lamina propria and mild chronic inflammation\n,These biopsies of columnar mucosa show focal acute inflammation, moderate chronic inflammation\n\n"
), Diagnosis = c(" Distal transverse colon polyp excision:- tubular adenoma, low grade dysplasia\n,Ileo-caecal valve, biopsies:\n,Stomach antrum biopsies:- normal mucosa\n,- Up to 34 eosinophils per high power field,Stomach, biopsy - Mild chronic inflammation\n",
" Rectum, polyp biopsy: - Tubular adenoma with mild dysplasia,- Raised intra-epithelial lymphocytes ,Duodenum, biopsies - within normal histological limits\n,B GI biopsy - DISTAL OESOPHAGUS X2, MID OESO X3, PROX OESO X2\n,Oesophagus, biopsies : - Minimal chronic inflammation,Sigmoid colon, polypectomy: - Tubular adenoma with moderate dysplasia,Oesophagus polyps biopsies:- 2 x papillomas\n,Duodenum biopsies:- normal\n"
)), .Names = c("HospitalNumber", "PatientName", "DOB", "GeneralPractitioner",
"Dateofprocedure", "ClinicalDetails", "Macroscopicdescription",
"Histology", "Diagnosis"), row.names = 1:2, class = "data.frame")
目标
我希望用户能够将他们想要传入的列命名为该函数(并且具有可变数量的参数以便执行此操作),以便该函数根据传入的内容识别要使用的函数列名。
所需的输出
然后,该函数应返回包含子函数创建的所有更改的数据框。
我有一个如下所示,但怀疑它有很多错误。
功能
ParentFunction<-(x,...){
args <- list(...)
pp<-if(!is.null(args[['DOB']])){ DOB_CleanupFunction(DOB)}
pp<-if(!is.null(args[['AColumn']])){AnotherCleanUpFunction(AColumn)}
return(pp)
}
用法
ParentFunction(pp, DOB='DOB', ProcedureDate='DateofProcedure', ClinicalDetails='ClinicalDetails', Diagnosis='Diagnosis')
答案 0 :(得分:0)
这是一个应该有效的解决方案:
parent_function <- function(df, cols) {
if ("DOB" %in% cols) df$DOB <- DOB_CleanupFunction(df$DOB)
if ("AColumn" %in% cols) df$AColumn <- AnotherCleanupFunction(df$AColumn)
return(df)
}
这个函数有两个参数:
df
- 您要清理的数据框cols
- 包含要清理的列名称的字符向量一个用法示例是:
df_clean <- parent_function(df, cols = c("DOB", "AColumn"))
从您的问题描述中,您不希望父函数具有任意数量的参数。恕我直言,用两个参数实现父函数更清楚,并且应该服务于所述目的。