Ordering data frame using variable as column name

时间:2015-07-28 16:17:29

标签: r

I have a couple of data frames that I want to be ordered by its last column respectively, I've been trying since a while but nothing succeeds, the main idea is to create a function to avoid doing this over and over for each data frame, the function I'm building is this:

    order_dataF = function(x){
            tCol = colnames(x[length(x)])
            print(tCol)
            #x <- x[with(x, order(-tCol),)]
            #x <- x[with(x, order(-(paste(tCol))),)]
            #x[do.call( order, x[,match(tCol,names(x))]),]
            #x <- x[order(x$tCol),]
    }

All the lines that have a comment on it are the ones I tested none of this are working as expected, I know this is because order needs the column name instead the variable I'm giving.

tCol always always bring to me the last column name, when I run this function this is the result:

[1] "TotalSearches"
Error in -(paste(tCol)) : invalid argument to unary operator
Calls: main ... [.data.frame -> with -> with.default -> eval -> eval -> order
Execution halted

I'm printing tCol to see if this is really containing the last column name, in this case, indeed it does have exactly what I need.

Perhaps this is a silly question/problem and it's too easy to solve but I cannot move forward as this is slowing me down, I'm frustrated.

Also I'm seeing this looks like duplicated but is not, nobody is being asking the right question (perhaps not even me) but the idea is "Order my the content of a string variable which is obtained from the data frame column names"

1 个答案:

答案 0 :(得分:3)

Generally, don't try to use with (or other "nonstandard" evaluation functions like subset) inside functions.

order_by_last_col = function(df) {
    df[order(df[, ncol(df)]), ]
}

# test
order_by_last_col(mtcars)

If using column names stored as character strings, you must use [, not $, because $ is also a non-standard evaluation shortcut, and it never evaluates the code that comes after $, it just looks for a column with that exact name. If you'd rather use names than indices (like above), do it this way with [:

order_by_last_col = function(df) {
    last_col_name = tail(names(df), 1)
    df[order(df[, last_col_name]), ]
}

Edit: Just a few more experiments to see why your initial attempts didn't work. they don't need to be in a function to not work, they just never work.

col = "wt"
mtcars$col # NULL
with(mtcars, head(col)) # "wt"
mtcars[, match(col, names(mtcars))] # this does work but is unnecessarily long
mtcars[, col] # works, easy
mtcars[[col]] # also works