我迷失在如何将数据组合成可用的数据框架中。我有city: data.main.city
list
lists
和character
number
以下是我的代码到目前为止的一个有效示例:
vectors
我一直在努力:
1)将每个remove(list=ls())
# Headers for each of my column names
headers <- c("name","p","c","prophylaxis","control","inclusion","exclusion","conversion excluded","infection criteria","age criteria","mean age","age sd")
#_name = author and year
#_p = no. in experimental arm.
#_c = no. in control arm
#_abx = antibiotic used
#_con = control used
#_inc = inclusion criteria
#_exc = exclusion criteria
#_coexc = was conversion to open excluded?
#_infxn = infection criteria
#_agecrit = age criteria
#_agemean = mean age of study
#_agesd = sd age of study
# Passos 2016
passos_name <- c("Passos","2016")
passos_p <- 50
passos_c <- 50
passos_abx <- "cefazolin 1g at induction"
passos_con <- "none"
passos_inc <- c("elective LC","symptomatic cholelithiasis","low risk")
passos_exc <- c("renal impairment","hepatic impairment","immunosuppression","regular steroid use","antibiotics within 48H","acute cholecystitis","choledocolithiasis")
passos_coexc <- TRUE
passos_infxn <- c("temperature >37.8C","tachycardia","asthenia","local pain","local purulence")
passos_agecrit <- NULL
passos_agemean <- 48
passos_agesd <- 13.63
passos <- list(passos_name,passos_p,passos_c,passos_abx,passos_con,passos_inc,passos_exc,passos_coexc,passos_infxn,passos_agecrit,passos_agemean,passos_agesd)
names(passos) <- headers
# Darzi 2016
darzi_name <- c("Darzi","2016")
darzi_p <- 182
darzi_c <- 247
darzi_abx <- c("cefazolin 1g 30min prior to induction","cefazolin 1g 6H after induction","cefazolin 1g 12H after induction")
darzi_con <- "NaCl"
darzi_inc <- c("elective LC","first time abdominal surgery")
darzi_exc <- c("antibiotics within 7 days","immunosuppression","acute cholecystitis","choledocolithiasis","cholangitis","obstructive jaundice",
"pancreatitis","previous biliary tract surgery","previous ERCP","DM","massive intraoperative bleeding","antibiotic allergy","major thalassemia",
"empyema")
darzi_coexc <- TRUE
darzi_infxn <- c("temperature >38C","local purulence","intra-abdominal collection")
darzi_agecrit <- c(">18", "<75")
darzi_agemean <- 43.75
darzi_agesd <- 13.30
darzi <- list(darzi_name,darzi_p,darzi_c,darzi_abx,darzi_con,darzi_inc,darzi_exc,darzi_coexc,darzi_infxn,darzi_agecrit,darzi_agemean,darzi_agesd)
names(darzi) <- headers
# Matsui 2014
matsui_name <- c("Matsui","2014")
matsui_p <- 504
matsui_c <- 505
matsui_abx <- c("cefazolin 1g at induction","cefazolin 1g 12H after induction","cefazolin 1g 24H after induction")
matsui_con <- "none"
matsui_inc <- "elective LC"
matsui_exc <- c("emergent","concurrent surgery","regular insulin use","regular steroid use","antibiotic allergy","HD","antibiotics within 7 days","hepatic impairment","chemotherapy")
matsui_coexc <- FALSE
matsui_infxn <- c("local purulence","intra-abdominal collection","distant infection","temperature >38C")
matsui_agecrit <- ">18"
matsui_agemean <- NULL
matsui_agesd <- NULL
matsui <- list(matsui_name,matsui_p,matsui_c,matsui_abx,matsui_con,matsui_inc,matsui_exc,matsui_coexc,matsui_infxn,matsui_agecrit,matsui_agemean,matsui_agesd)
names(matsui) <- headers
# Find unique exclusion critieria in order to create the list of all possible levels
exc <- ls()[grepl("_exc",ls())]
exclist <- sapply(exc,get)
exc.levels <- unique(unlist(exclist,use.names = F))
# Find unique inclusion critieria in order to create the list of all possible levels
inc <- ls()[grepl("_inc",ls())]
inclist <- sapply(inc,get)
inc.levels <- unique(unlist(inclist,use.names = F))
# Find unique antibiotics order to create the list of all possible levels
abx <- ls()[grepl("_abx",ls())]
abxlist <- sapply(abx,get)
abx.levels <- unique(unlist(abxlist,use.names = F))
# Find unique controls in order to create the list of all possible levels
con <- ls()[grepl("_con",ls())]
conlist <- sapply(con,get)
con.levels <- unique(unlist(conlist,use.names = F))
# Find unique age critieria in order to create the list of all possible levels
agecrit <- ls()[grepl("_agecrit",ls())]
agecritlist <- sapply(agecrit,get)
agecrit.levels <- unique(unlist(agecritlist,use.names = F))
,_exc
,_inc
,_abx
,_con
列表转换为 _agecrit
在代码块的末尾生成。我一直在尝试使用levels
循环,例如:
for
这只会创建一个变量for (x in exc) {
as.name(x) <- factor(get(x),levels = exc.levels)
}
,将最后解析的x
存储为list
。
2)将我的所有数据合并为factor
,格式如下:
data frame
我在StackOverflow上尝试了各种解决方案,但没有发现任何有效的方法;例如:
name, p, c, prophylaxis, control, inclusion, exclusion, conversion excluded, infection criteria, age criteria, mean age, age sd
"Passos 2016", 50, 50, "cefazolin 1g at induction", "none", ["elective LC","symptomatic cholelithiasis","low risk"], ["renal impairment","hepatic impairment","immunosuppression","regular steroid use","antibiotics within 48H","acute cholecystitis","choledocolithiasis"], TRUE, ["temperature >37.8C","tachycardia","asthenia","local pain","local purulence"], NULL, 48, 13.63
...
# [] = factors
# columns correspond to each studies variables (i.e. passos_name, passos_p, passos_c, etc..)
# rows correspond to each study (i.e., passos, darzi, matsui)
我怀疑我的数据可能不适合数据框?主要是因为尝试在列中存储因子列表。这是允许的吗?如果没有,这种类型的数据通常如何存储?我的目标是能够在不同的研究中有意义地比较这些不同的标准。
答案 0 :(得分:2)
这对于评论来说太长了,所以我把它变成了一个&#34;答案&#34;:
首先,看看这里发生了什么:
Item
在第一个中,我们使用列#34; name&#34;创建了一个数据框。其中包含一个条目&#34; Passos,2016&#34;,即包含两条信息的一个字符,以及#34; p&#34;列。一切都很好。现在,在第二个版本中,我指定了列&#34; name&#34;如上所述,使用data.frame(name = "Passos, 2016", p = 50)
name p
1 Passos, 2016 50
data.frame(name = c("Passos", "2016"), p = 50)
name p
1 Passos 50
2 2016 50
。这是一个双元素向量,因此我们在数据帧中得到两行:一行名为Passos,一行名称为2016,列c(Passos, 2016)
被回收。
p
只是回收较短的向量。现在,如果我添加一个包含三个元素的向量,您认为会发生什么?
这突出了你正在做的主要问题:你试图从不同长度的许多向量中获取数据帧。现在,在某些情况下,如果你想要重复较短的向量,这是很好的(在R
,我们称之为&#34;再循环&#34;),但它看起来不像你想在这里做的事情。
所以,我的建议是:尝试想象一个矩阵,并确保你理解每个元素(行和列)应该是什么。然后相应地指定您的数据。如果有疑问,请查看整洁的数据&#34;。