Question

I want to add a variable to ALL dataframes in my global environment and make the value of the newly added column equal to the dataframe name.

Product=c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","C","C","C")
Day=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")

data1=data.frame(Product, Day)

Product2=c("Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Y","Y","Y","X","X","X")
Day2=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")

data2=data.frame(Product2, Day2)

I want to add a column in both dataframes whose value is equal to the dataframe name, i.e newvar="data1" for data1 and newvar="data2" for data2. My actual data frame list is much longer than this.

Any help is greatly appreciated.

Thanks!

Answer 1

If the 'data.frame' object names are 'data' followed by number, we can either use paste to get the object names as a string (if we already know the object names)

  nm1 <- paste0('data', 1:2)

Or another option would be to use ls with the pattern argument if there are 100's of object names in the global environment and we don't know how many objects are present.

  nm1 <- ls(pattern='^data\\d+')

Get the values in a list using mget, and create a new column ('newvar') by cbinding with Map. Using Map make sure that each dataset in the list is added with a new column corresponding to the object names.

  lst <- Map(cbind, mget(nm1), newvar= nm1)

It is better to keep it in a list as it can do all the operations within it. But, if the original object needs to be updated in the global environment, list2env is a an option (not recommended though)

  list2env(lst, envir=.GlobalEnv)

I may be also useful to read all the files (.csv/.txt) in a list directly rather than creating individual objects. For example, we can read all the files in the working directory by

   files <- list.files()
   lst <- lapply(files, read.csv, stringsAsFactors=FALSE)

The arguments may need some changes according to the delimiter.

Answer 2

Here's a function, where you can pass any arbitrary number of named data.frames, and it will return a list of named data.frames back with the requested column added. Using the list2env function (as in @akrun's answer) you can then put these in whatever environment you want. (You could also modify the function to produce that side-effect automatically.)

f <- function(...) {
    objnames <- as.character(substitute(c(...)))[-1]
    obj <- list(...)
    out <- mapply(function(x, col) {
        x[, col] <- col
        x
    }, obj, objnames, SIMPLIFY = FALSE)
    setNames(out, objnames)
}

Here's how to use it:

list2env(f(data1,data2), .GlobalEnv)
# <environment: R_GlobalEnv>
str(data1)
# 'data.frame':   18 obs. of  3 variables:
#  $ Product: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
#  $ Day    : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 2 6 7 5 ...
#  $ data1  : chr  "data1" "data1" "data1" "data1" ...
str(data2)
# 'data.frame':   18 obs. of  3 variables:
#  $ Product2: Factor w/ 3 levels "X","Y","Z": 3 3 3 3 3 3 3 3 3 3 ...
#  $ Day2    : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 2 6 7 5 ...
#  $ data2   : chr  "data2" "data2" "data2" "data2" ...

If you had a large number of named objects that you wanted to pass without listing them explicitly in f(), you could do something like:

list2env(do.call(f, sapply(ls(pattern = "data"), as.name)), .GlobalEnv)

which would have the same result.

R: Add a new variable to dataframes whose value is equal to the name of the dataframes

2 个答案: