failed to omit Columns containing NA values with: na.rm=TRUE and na.action=NULL

时间:2019-04-16 23:45:31

标签: r aggregate-functions

I wanted to get the same output as the one on: https://www.r-bloggers.com/how-to-aggregate-data-in-r/

My output is:

Group.1 Group.2 Name Role Shift Salary  Age
1    Cook  Dinner   NA   NA    NA   1800 25.0
2 Manager  Dinner   NA   NA    NA   2000 41.0
3  Server  Dinner   NA   NA    NA   1650 27.5
4    Cook   Lunch   NA   NA    NA   1200 24.0
5 Manager   Lunch   NA   NA    NA   2200 32.0
6  Server   Lunch   NA   NA    NA   1350 24.0

with Columns containing NAs. Including "na.rm=TRUE" and "na.action=NULL" did not make any difference.

I also keep receiving warnings:

Warning messages: 1: In mean.default(X[[i]], ...) : argument is not numeric or logical: returning NA

How do I modify aggregate() which would make it omit unnecessary columns and\or NA values without having to resort to using dplyr?

Thanks

agg = aggregate(data,
                 by = list(data$Role, data$Shift),
                 FUN = mean, na.rm=TRUE, na.action=NULL)

1 个答案:

答案 0 :(得分:1)

Let's take a look at your aggregate call

aggregate(data, by = list(data$Role, data$Shift), FUN = mean)

Here you are calculating the average of values across all columns of data by data$Role and data$Shift (which are your grouping variables).

The error is pretty self-explanatory in telling you that you are trying to calculate the mean of non-numeric entries. data$Name, data$Role and data$Shift are all non-numeric columns.

I assume you are after

aggregate(. ~ Role + Shift, data = data[, -1], FUN = mean)
#     Role  Shift Salary  Age
#1    Cook Dinner   1800 25.0
#2 Manager Dinner   2000 41.0
#3  Server Dinner   1650 27.5
#4    Cook  Lunch   1200 24.0
#5 Manager  Lunch   2200 32.0
#6  Server  Lunch   1350 24.0

The . (dot) here denotes all variables except the ones on the RHS of the ~ (tilde). Notice how we exclude data$Name by passing data[, -1] as the data argument to aggregate.

Or using the by syntax

aggregate(data[, c("Salary", "Age")], by = list(data$Role, data$Shift), FUN = "mean")

Here the x argument refers to all columns the values of which you want to aggregate according to groups defined in by.


In response to your comment, to aggregate only by Role

aggregate(cbind(Salary, Age) ~ Role, data = data[, -1], FUN = mean)
#    Role Salary   Age
#1    Cook   1500 24.50
#2 Manager   2100 36.50
#3  Server   1500 25.75