我试图根据this SO post返回每个组的均值,但解决方案在这种情况下似乎不起作用。有人可以解释为什么我仍然有一个全球化的意思吗?
tmp = tempfile(fileext = ".xlsx")
download.file(url = "https://www.bls.gov/emp/ind-occ-matrix/occupation.xlsx", destfile = tmp, mode="wb")
library(readxl)
csv <- read_excel(tmp,sheet=8)
########################################################
colnames(csv)<-c("title","code","Occupation Type","Employment2014","Employment2024" ,"EmploymentChange2014-24.Num","EmploymentChange2014-24.Percent","Percent self employed2014","Job openings due to growth and replacements2014-24","Median annual wage2015","Typical education needed for entry","Work experience in a related occupation","Typical on-the-job training needed")
csv<-csv[csv[,3]=="Line item",]
csv$"Median annual wage2015"<-as.numeric(csv$"Median annual wage2015")
library(dplyr)
csv%>%group_by(csv$"Typical education needed for entry")%>%summarise(n=n(),mean=mean(csv$"Median annual wage2015",na.rm=T))
答案 0 :(得分:0)
您的dplyr
申请不完全正确。像这样删除csv$
。因为您从mean
链的上下文中获取了dplyr
的数据,因此获得了group_by
函数。
library(dplyr)
csv %>%
group_by(`Typical education needed for entry`) %>%
summarise(n=n(),
mean=mean(`Median annual wage2015`,na.rm=T))
此外,您还可以使用制表符输入代码,使其代码更具可读性(
)。