我有这个数据框:
> head(merged.tables)
Store DayOfWeek Date Sales Customers Open Promo StateHoliday SchoolHoliday StoreType
1 1 5 2015-07-31 5263 555 1 1 0 1 c
2 1 6 2013-01-12 4952 646 1 0 0 0 c
3 1 5 2014-01-03 4190 552 1 0 0 1 c
4 1 3 2014-12-03 6454 695 1 1 0 0 c
5 1 3 2013-11-13 3310 464 1 0 0 0 c
6 1 7 2013-10-27 0 0 0 0 0 0 c
Assortment CompetitionDistance CompetitionOpenSinceMonth CompetitionOpenSinceYear Promo2
1 a 1270 9 2008 0
2 a 1270 9 2008 0
3 a 1270 9 2008 0
4 a 1270 9 2008 0
5 a 1270 9 2008 0
6 a 1270 9 2008 0
Promo2SinceWeek Promo2SinceYear PromoInterval
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
然后我想在打开等于1 和 StoreType 时提取显示销售平均值的数据框。 我使用这个命令,因为它是我认为最好的:
merged.tables[StateHoliday==1,mean(na.omit(Sales)),by=StoreType]
但我收到了这个错误:
[.data.frame中的错误(merged.tables,StateHoliday == 0, mean(na.omit(Sales)),: unused arguments(by = StoreType)
我搜索但是我没有得到这个错误的答案。谢谢你的帮助!
答案 0 :(得分:4)
我有一个错误。
我意识到问题已经解决:我的数据不是data.table格式。
示例: 复制<-data.table(data)
答案 1 :(得分:1)
有很多方法可以将函数应用于数据框中的一组值。我提出两个:
dplyr
包以回答问题的方式排列数据。tapply()
,它对一组值执行功能。对于每种商店类型,我想要Open
值等于1的商店的平均销售额。
注意:以下数据框仅与OP中发布的列相比有几列。
# install necessary package
install.packages( pkgs = "dplyr" )
# load necessary package
library( dplyr )
# create data frame
merged.tables <-
data.frame(
Store = c( 1, 1, 1, 2, 2, 2 )
, StoreType = rep( x = c( "s", "m", "l" ) , times = 2)
, Sales = round( x = runif( n = 6, min = 3000, max = 6000 ) , digits = 0 )
, Open = c( 1, 1, 0, 0, 1, 1 )
, stringsAsFactors = FALSE
)
# view the data
merged.tables
# Store StoreType Sales Open
# 1 1 s 4608 1
# 2 1 m 4017 1
# 3 1 l 4210 0
# 4 2 s 4833 0
# 5 2 m 3818 1
# 6 2 l 3090 1
# dplyr method
merged.tables %>%
group_by( StoreType ) %>%
filter( Open == 1 ) %>%
summarise( AverageSales = mean( x = Sales , na.rm = TRUE ) )
# A tibble: 3 x 2
# StoreType AverageSales
# <chr> <dbl>
# 1 l 3090
# 2 m 3918
# 3 s 4608
# tapply method
# create the condition
# that 'Open' must be equal to one
Open.equals.one <- which( merged.tables$Open == 1 )
# apply the condition to
# both X and INDEX
tapply( X = merged.tables$Sales[ Open.equals.one ]
, INDEX = merged.tables$StoreType[ Open.equals.one ]
, FUN = mean
, na.rm = TRUE # just in case your data does have NA values in the `Sales` column, this removes them from the calculation
)
# l m s
# 3090.0 3917.5 4608.0
# end of script #
如果您以后需要更多条件,我们建议您查看其他相关的SO帖子,例如How to combine multiple conditions to subset a data-frame using “OR”?和Why is [
better than subset
?。