如何在循环中迭代地添加到数据框

时间:2019-04-01 17:08:12

标签: r

背景

我有一个如下所示的数据框(made of synthetic data for those who are interested)。它由半结构化文本组成。文本由标题分隔。标头标题始终相同,但某些标头有时不显示在报告中(但全部以相同的顺序出现)。

数据

let

我当前的解决方案

我创建了一个函数,该函数根据字符定界符(标题的名称)列表提取文本。

此刻,它获取保存在(x)中的数据框以及文本列(y),以及开始标头和结束标头,最后创建列标头(即开始标题)。

这行得通,我认为:

structure(list(OGDReportWhole = c("Hospital: Random NHS Foundation Trust\nHospital Number: J6044658\nPatient Name:  Jargon, Victoria\nGeneral Practitioner: Dr. Martin, Marche\nDate of procedure:  2009-11-11\nEndoscopist: Dr. Sullivan, Shelby\nSecond endoscopist: Dr. al-Basha, Mahfoodha\nMedications: Fentanyl  12.5mcg\nMidazolam  6mg\nInstrument:  FG5\nExtent of Exam:  GOJ\nIndications: Follow-up ULCER HEALING\nProcedure Performed: Gastroscopy (OGD)\nFindings:  No evidence of Barrett's oesophagus, short 2 cn hiatus hernia.,Oesophageal biopsies taken from three levels as requested.,OGD today to assess for ulceration/ongoing bleeding.,Diaphragmatic pinch:40cm .,She has a small hiatus hernia .,We will re-book for 2 weeks, rebanding.,Tiny erosions at the antrum.,Biopsies taken from top of stricture-metal marking clips in situ.,The varices flattened well with air insufflation.,He is on Barrett's Screeling List in October 2017 at St Thomas'.\nHALO 90 done with good effect\nEndoscopic Diagnosis:  Post chemo-radiotherapy stricture ", 
"Hospital: Random NHS Foundation Trust\nHospital Number: Y6417773\nPatient Name:  Powell, Destiny\nGeneral Practitioner: Dr. al-Safi, Lutfiyya\nDate of procedure:  2008-06-15\nEndoscopist: Dr. Kekich, Annabelle\nSecond endoscopist: Dr. Needham, April\nMedications: Fentanyl  125mcg\nMidazolam  7mg\nInstrument:  FG6\nExtent of Exam:  Pylorus\nIndications: Weight Loss\nProcedure Performed: Gastroscopy (OGD)\nFindings:  Duodenum: Duodenitis with a small erosion .,STOMACH: diffuse gastritis with angiodysplasia and punctate bleeding site on greater curve mid body - no obvious ulcer- antrum scar ?,No immediate complications.,Z-line at: 38cm - Bravo placed at 32cm- good positionat check endoscopy.\n\nEndoscopic Diagnosis:  Esophageal candidiasis "
)), row.names = 1:2, class = "data.frame")

我反复运行它:

#' @param x the dataframe
#' @param y the column to extract from
#' @param stra the start of the boundary to extract
#' @param strb the end of the boundary to extract
#' @param t the column name to create

    Extractor2 <- function(x, y, stra, strb, t) {
      x <- data.frame(x)
      t <- gsub("[^[:alnum:],]", " ", t)

      t <- gsub(" ", "", t, fixed = TRUE)

      x[, t] <- stringr::str_extract(x[, y], stringr::regex(paste(stra,
                                                                  "(.*)", strb, sep = ""), dotall = TRUE))
      x[, t] <- gsub("\\\\.*", "", x[, t])

      names(x[, t]) <- gsub(".", "", names(x[, t]), fixed = TRUE)
      x[, t] <- gsub("       ", "", x[, t])
      x[, t] <- gsub(stra, "", x[, t], fixed = TRUE)
      if (strb != "") {
        x[, t] <- gsub(strb, "", x[, t], fixed = TRUE)
      }
      x[, t] <- gsub("       ", "", x[, t])
      x[, t]<- ColumnCleanUp(x[, t])          
      return(x)
    }

问题

我想让函数只接受一个字符串(而不是一个数据框,然后是列名),然后将其添加到一个空的数据框(包括原始字符串)中。

我不确定如何将函数从获取数据帧并添加到该数据帧转换为将inputString添加到空数据帧。我希望它创建与当前函数相同的输出。

我很乐意对功能进行一般性的批评,如果有更好的方法可以实现我正在尝试的功能

**答案*

好的,感谢@ M-M ...我有点慢。.

答案很简单。只需使用定界符列表创建一个空的数据框,然后从那里开始...

EndoscTree<-list('Hospital Number:','Patient Name:','General Practitioner:',
'Date of procedure:','Endoscopist:','Second Endoscopist:','Medications',
'Instrument','Extent of Exam:','Indications:','Procedure Performed:',
'Findings:','Endoscopic Diagnosis:')
for(i in 1:(length(EndoscTree)-1)) {
Mydata<-Extractor2(Mydata,'OGDReportWhole',as.character(EndoscTree[i]),
as.character(EndoscTree[i+1]),as.character(EndoscTree[i]))
}

0 个答案:

没有答案