我有一个包含6年数据的数据框。这些年份中的每一个都有相同的变量。我试图找到6年内每个变量的均值。每年不同行中都缺少(NA)。在此示例中,我试图获取6岁以上女孩的平均人数。
我尝试使用mutate和pipe函数,但是它似乎不起作用。通过复制所有列,我得到了奇怪的结果。
roughcopy2$headcount_girls_mean <-
roughcopy2 %>%
mutate(headcount_girls_mean=rowMeans(.[ ,
c("headcount_total_girls_rounded_1314","headcount_total_girls_rounded_1415",
"headcount_total_girls_rounded_1516" ,
"headcount_total_girls_rounded_1617",
"headcount_total_girls_1718",
"headcount_total_girls_1819")], na.rm=TRUE))
此代码将复制我在数据框中的所有列,并添加“ headcount_girls_mean”。复制数据集中的每个列名称。所以我的原始数据集,即roughcopy2有150列。运行上述命令后,我得到300列,而后150列与前150列相同,但是前缀为“ headcount_girls_mean”。
答案 0 :(得分:1)
使用一个假想的数据帧样本:
roughcopy2 <- data.frame("headcount_total_girls_rounded_1314"=c(1,4,2,4,8),
"headcount_total_girls_rounded_1415"=c(2, NA, 4, NA,8),
"headcount_total_girls_rounded_1516"=c(6,8,10,12,14),
"headcount_total_girls_rounded_1617"=c(4,5,5,3,2),
"headcount_total_girls_1718"=c(8,5,9,NA,2),
"headcount_total_girls_1819"=c(NA,2,4,7,3))
如果您想要列的平均值,则可以简单地进行以下操作:
means <- as.numeric(colMeans(x=roughcopy2, na.rm = TRUE))
但是,如果您希望获得多列值的均值:
roughcopy2 <- mutate(roughcopy2,
headcount_mean = rowMeans(select(roughcopy2, starts_with("headcount")),
na.rm = TRUE))
它应该输出(省略了其他列,但它们在数据框中):
headcount_total_girls_1718 headcount_total_girls_1819 head_count_mean
1 8 NA 4.20
2 5 2 4.80
3 9 4 5.67
4 NA 7 6.50
5 2 3 6.17
您应该放置原始数据框架的样本以及对输出期望的一般概念。
答案 1 :(得分:1)
我仍然不确定您的意图,但是如果您希望获得每列的平均值,则应该可以使用以下内容。这个答案基于我的评论和@Pedro_Henrique:
location: '..../Default/17.200.001/PurchaseReceipt/ReleasePurchaseReceipt/status/47f6c8e4-d9ef-410a-8bf5-d3623a59dd4b',
server: 'Microsoft-IIS/10.0',
'x-handled-by': 'Acumatica-PX.Export/AuthenticationManagerModule',
'set-cookie': [ 'Locale=TimeZone=GMTM0500G&Culture=en-US; path=/', 'UserBranch=49; path=/', 'LegacyUI=0; path=/' ],
'x-powered-by': 'ASP.NET',
date: 'Wed, 07 Aug 2019 13:36:31 GMT',
connection: 'close',
'content-length': '0'
结果输出:
library(tidyverse)
roughcopy2 <- tibble("headcount_total_girls_rounded_1314"=c(1,4,2,4,8),
"headcount_total_girls_rounded_1415"=c(2, NA, 4, NA,8),
"headcount_total_girls_rounded_1516"=c(6,8,10,12,14),
"headcount_total_girls_rounded_1617"=c(4,5,5,3,2),
"headcount_total_girls_1718"=c(8,5,9,NA,2),
"headcount_total_girls_1819"=c(NA,2,4,7,3))
roughcopy2 %>%
gather(headcount_year, count) %>%
group_by(headcount_year) %>%
summarise(mean_count = mean(count, na.rm = TRUE))