基于dataframe后缀的动态列创建

时间:2016-08-23 13:14:53

标签: r

我在具有不同后缀的文件夹中有多个.csv文件。例如:

Data_Software
Data_Hardware
Data_Manufacturing ....

&安培;类似的许多其他.csv文件。我想在每个数据集中创建一个新列,说“type”,它将包含相应文件的后缀,即; Data_Software中类型列的所有观察都应该说软件,Data_Hardware应该有硬件。

有人可以帮忙吗?

2 个答案:

答案 0 :(得分:0)

试试这个,抱歉,假设它们是您环境中的data.frames,情况并非如此,请随意忽略/建议更改:

# Data frames in your environment
Data_Tom <- iris
Data_Dick <- iris
Data_Harry <- iris

# Get the names of the objects
objs <- ls(pattern = "Data_")

# Add the suffix as a the column
objs <- lapply(objs, 
               function(x){
                 type <- gsub("Data_", "", x)
                 df <- get(x)
                 cbind(df, Type = type)
               })

# Combine them together, you might not need this
combine <- do.call(rbind, objs)

答案 1 :(得分:0)

虽然我不建议,但我可能会这样做:

library(data.table) # need for fread and :=

# Get a list of all files in the directory 
my_dir <- "my_path_here"    
FILES <- list.files(path = my_dir, pattern="*.csv$", full.names = TRUE, recursive = FALSE)

# Read every file
lapply(FILES, function(x) { assign(gsub(paste0(my_dir,"/|\\.csv$|Data_"),"",x),fread(x, header = T)[, Type := gsub(paste0(my_dir,"/|\\.csv$|Data_"),"",x)], envir = .GlobalEnv)})

这为每个csv创建一个表 - 该表的名称与文件名称相同,剥离扩展名,路径和Data_。它还会在读取

时创建一个包含表名的列