Question

我有一个数据框列表（通过读取目录中的每个文件创建）;每个数据框包含4列：

hit_name  (character)
hit_raw_value (real)
hit_norm_value (real)
hit_significance (real)

此外，我想为每个数据框添加3列，称为 - drugname，dose和group。特定数据框中每列的值在所有行中都是相同的，并且可以通过解析该数据框的名称来获得，其格式为：“drugname_dose_group_date_studyname”。例如，一个数据框称为“tylenol_5mg_group1_oct14_pilotstudy”，因此我要添加的'drugname'列将采用值'tylenol'，'剂量'列将采用值'5mg'，并且组列将采用值'group1'。

我尝试下面的代码将x $ drugname设置为包含该数据框中所有已存在列的名称的字符串，而不是将其设置为'drugname'。

all_files = lapply(paste(mydir,filenames,sep="") ,read.delim) 
names(all_files) = gsub(".txt","", filenames)
lapply( all_files,
     function(x) {
         x$drugname = gsub(".+?\\_(.+?)\\_(.+?)\\_(.+?)\\_.+", "\\1", deparse(quote(x)))   
         x$dose = gsub(".+?\\_(.+?)\\_(.+?)\\_(.+?)\\_.+", "\\2", deparse(quote(x)))        
         x$group = gsub(".+?\\_(.+?)\\_(.+?)\\_(.+?)\\_.+", "\\3", deparse(quote(x)))
}

Answer 1

deparse(quote(x))在lapply中返回x，因为变量x已定义。

你可以尝试这个吗

for (i in names(all_files)){
    newCols = strsplit(names(all_files)[i], "_")[[1]]
    all_files[[i]]$drugname = newCols[1]
    all_files[[i]]$dose     = newCols[2]
    ...
}
do.call(rbind, all_files)

Answer 2

这段代码用于循环遍历数据框的名称并适当填充新的$ drugname和$ dose列，感谢您的帮助：

for (i in names(all_files)){
  newCols = unlist(strsplit(names(all_files)[i], "_"))
  all_files[[i]]$drugname = newCols[1]
  all_files[[i]]$dose     = newCols[2]
  ...
}

循环遍历数据框列表并解析数据框名称以检索新列的值

2 个答案: