难以将列表,字符和数字组合到数据框中

时间:2018-03-11 20:39:31

标签: r

我迷失在如何将数据组合成可用的数据框架中。我有city: data.main.city list listscharacter number以下是我的代码到目前为止的一个有效示例:

vectors

我一直在努力:

1)将每个remove(list=ls()) # Headers for each of my column names headers <- c("name","p","c","prophylaxis","control","inclusion","exclusion","conversion excluded","infection criteria","age criteria","mean age","age sd") #_name = author and year #_p = no. in experimental arm. #_c = no. in control arm #_abx = antibiotic used #_con = control used #_inc = inclusion criteria #_exc = exclusion criteria #_coexc = was conversion to open excluded? #_infxn = infection criteria #_agecrit = age criteria #_agemean = mean age of study #_agesd = sd age of study # Passos 2016 passos_name <- c("Passos","2016") passos_p <- 50 passos_c <- 50 passos_abx <- "cefazolin 1g at induction" passos_con <- "none" passos_inc <- c("elective LC","symptomatic cholelithiasis","low risk") passos_exc <- c("renal impairment","hepatic impairment","immunosuppression","regular steroid use","antibiotics within 48H","acute cholecystitis","choledocolithiasis") passos_coexc <- TRUE passos_infxn <- c("temperature >37.8C","tachycardia","asthenia","local pain","local purulence") passos_agecrit <- NULL passos_agemean <- 48 passos_agesd <- 13.63 passos <- list(passos_name,passos_p,passos_c,passos_abx,passos_con,passos_inc,passos_exc,passos_coexc,passos_infxn,passos_agecrit,passos_agemean,passos_agesd) names(passos) <- headers # Darzi 2016 darzi_name <- c("Darzi","2016") darzi_p <- 182 darzi_c <- 247 darzi_abx <- c("cefazolin 1g 30min prior to induction","cefazolin 1g 6H after induction","cefazolin 1g 12H after induction") darzi_con <- "NaCl" darzi_inc <- c("elective LC","first time abdominal surgery") darzi_exc <- c("antibiotics within 7 days","immunosuppression","acute cholecystitis","choledocolithiasis","cholangitis","obstructive jaundice", "pancreatitis","previous biliary tract surgery","previous ERCP","DM","massive intraoperative bleeding","antibiotic allergy","major thalassemia", "empyema") darzi_coexc <- TRUE darzi_infxn <- c("temperature >38C","local purulence","intra-abdominal collection") darzi_agecrit <- c(">18", "<75") darzi_agemean <- 43.75 darzi_agesd <- 13.30 darzi <- list(darzi_name,darzi_p,darzi_c,darzi_abx,darzi_con,darzi_inc,darzi_exc,darzi_coexc,darzi_infxn,darzi_agecrit,darzi_agemean,darzi_agesd) names(darzi) <- headers # Matsui 2014 matsui_name <- c("Matsui","2014") matsui_p <- 504 matsui_c <- 505 matsui_abx <- c("cefazolin 1g at induction","cefazolin 1g 12H after induction","cefazolin 1g 24H after induction") matsui_con <- "none" matsui_inc <- "elective LC" matsui_exc <- c("emergent","concurrent surgery","regular insulin use","regular steroid use","antibiotic allergy","HD","antibiotics within 7 days","hepatic impairment","chemotherapy") matsui_coexc <- FALSE matsui_infxn <- c("local purulence","intra-abdominal collection","distant infection","temperature >38C") matsui_agecrit <- ">18" matsui_agemean <- NULL matsui_agesd <- NULL matsui <- list(matsui_name,matsui_p,matsui_c,matsui_abx,matsui_con,matsui_inc,matsui_exc,matsui_coexc,matsui_infxn,matsui_agecrit,matsui_agemean,matsui_agesd) names(matsui) <- headers # Find unique exclusion critieria in order to create the list of all possible levels exc <- ls()[grepl("_exc",ls())] exclist <- sapply(exc,get) exc.levels <- unique(unlist(exclist,use.names = F)) # Find unique inclusion critieria in order to create the list of all possible levels inc <- ls()[grepl("_inc",ls())] inclist <- sapply(inc,get) inc.levels <- unique(unlist(inclist,use.names = F)) # Find unique antibiotics order to create the list of all possible levels abx <- ls()[grepl("_abx",ls())] abxlist <- sapply(abx,get) abx.levels <- unique(unlist(abxlist,use.names = F)) # Find unique controls in order to create the list of all possible levels con <- ls()[grepl("_con",ls())] conlist <- sapply(con,get) con.levels <- unique(unlist(conlist,use.names = F)) # Find unique age critieria in order to create the list of all possible levels agecrit <- ls()[grepl("_agecrit",ls())] agecritlist <- sapply(agecrit,get) agecrit.levels <- unique(unlist(agecritlist,use.names = F)) _exc_inc_abx_con列表转换为 _agecrit在代码块的末尾生成。我一直在尝试使用levels循环,例如:

for

这只会创建一个变量for (x in exc) { as.name(x) <- factor(get(x),levels = exc.levels) } ,将最后解析的x存储为list

2)将我的所有数据合并为factor ,格式如下:

data frame

我在StackOverflow上尝试了各种解决方案,但没有发现任何有效的方法;例如:

name, p, c, prophylaxis, control, inclusion, exclusion, conversion excluded, infection criteria, age criteria, mean age, age sd
"Passos 2016", 50, 50, "cefazolin 1g at induction", "none", ["elective LC","symptomatic cholelithiasis","low risk"], ["renal impairment","hepatic impairment","immunosuppression","regular steroid use","antibiotics within 48H","acute cholecystitis","choledocolithiasis"], TRUE, ["temperature >37.8C","tachycardia","asthenia","local pain","local purulence"], NULL, 48, 13.63
...
# [] = factors
# columns correspond to each studies variables (i.e. passos_name, passos_p, passos_c, etc..)
# rows correspond to each study (i.e., passos, darzi, matsui)

我怀疑我的数据可能不适合数据框?主要是因为尝试在列中存储因子列表。这是允许的吗?如果没有,这种类型的数据通常如何存储?我的目标是能够在不同的研究中有意义地比较这些不同的标准。

1 个答案:

答案 0 :(得分:2)

这对于评论来说太长了,所以我把它变成了一个&#34;答案&#34;:

首先,看看这里发生了什么:

Item

在第一个中,我们使用列#34; name&#34;创建了一个数据框。其中包含一个条目&#34; Passos,2016&#34;,即包含两条信息的一个字符,以及#34; p&#34;列。一切都很好。现在,在第二个版本中,我指定了列&#34; name&#34;如上所述,使用data.frame(name = "Passos, 2016", p = 50) name p 1 Passos, 2016 50 data.frame(name = c("Passos", "2016"), p = 50) name p 1 Passos 50 2 2016 50 。这是一个双元素向量,因此我们在数据帧中得到两行:一行名为Passos,一行名称为2016,列c(Passos, 2016)被回收。

显然,后者可能不是你想要的。但它仍然有效,因为p只是回收较短的向量。现在,如果我添加一个包含三个元素的向量,您认为会发生什么?

这突出了你正在做的主要问题:你试图从不同长度的许多向量中获取数据帧。现在,在某些情况下,如果你想要重复较短的向量,这是很好的(在R,我们称之为&#34;再循环&#34;),但它看起来不像你想在这里做的事情。

所以,我的建议是:尝试想象一个矩阵,并确保你理解每个元素(行和列)应该是什么。然后相应地指定您的数据。如果有疑问,请查看整洁的数据&#34;。