Question

我试图根据this SO post返回每个组的均值，但解决方案在这种情况下似乎不起作用。有人可以解释为什么我仍然有一个全球化的意思吗？

tmp = tempfile(fileext = ".xlsx")
download.file(url = "https://www.bls.gov/emp/ind-occ-matrix/occupation.xlsx", destfile = tmp, mode="wb")
library(readxl)
csv <- read_excel(tmp,sheet=8)
########################################################
colnames(csv)<-c("title","code","Occupation Type","Employment2014","Employment2024" ,"EmploymentChange2014-24.Num","EmploymentChange2014-24.Percent","Percent self employed2014","Job openings due to growth and replacements2014-24","Median annual wage2015","Typical education needed for entry","Work experience in a related occupation","Typical on-the-job training needed")
csv<-csv[csv[,3]=="Line item",]
csv$"Median annual wage2015"<-as.numeric(csv$"Median annual wage2015")

library(dplyr)
csv%>%group_by(csv$"Typical education needed for entry")%>%summarise(n=n(),mean=mean(csv$"Median annual wage2015",na.rm=T))

Answer 1

您的dplyr申请不完全正确。像这样删除csv$。因为您从mean链的上下文中获取了dplyr的数据，因此获得了group_by函数。

library(dplyr)
csv %>%  
  group_by(`Typical education needed for entry`) %>% 
  summarise(n=n(), 
    mean=mean(`Median annual wage2015`,na.rm=T))

此外，您还可以使用制表符输入代码，使其代码更具可读性（

）。

指定列时，dplyr返回全局均值

1 个答案: