我正在处理有关英国地区 22 个月内失业率的数据集。
我将原始数据集一分为二:一个包括失业率较高的地区 (df1),另一个包括失业率较低的地区 (df2 >).
所需的输出对两者都是一样的,所以我只贴出df1的结构:
df1 目前包括五个地区的每月失业率:
我想计算每个地区每个月的平均失业率(即东北、伦敦(等)1 月 19 日、2 月 20 日(一直到 10 月 20 日)的平均值。
重点是,一旦我将所有地区的平均失业率汇总为一个图,我就可以有一个图而不是五个不同的图。
预期输出:
Date | Region | Unemployment rate
01-2019 | ABC | AJan_19+B_Jan19+C_Jan19 / 3
02-2019 | ABC | AFeb_19+B_Feb19+C_Feb19 / 3
03-2019 | ABC | AMar_19+B_Feb19+C_Feb19 / 3
等等
因此,我不是每个月有 5 个值(即每个区域一个值),而是将区域的值相加,然后除以每个月的区域数。
这里是df1的结构
structure(list(
Date = structure(c(17897, 17897, 17897, 17897,
17897, 17928, 17928, 17928, 17928, 17928, 17956, 17956, 17956,
17956, 17956, 17987, 17987, 17987, 17987, 17987, 18017, 18017,
18017, 18017, 18017, 18048, 18048, 18048, 18048, 18048, 18078,
18078, 18078, 18078, 18078, 18109, 18109, 18109, 18109, 18109,
18140, 18140, 18140, 18140, 18140, 18170, 18170, 18170, 18170,
18170, 18201, 18201, 18201, 18201, 18201, 18231, 18231, 18231,
18231, 18231, 18262, 18262, 18262, 18262, 18262, 18293, 18293,
18293, 18293, 18293, 18322, 18322, 18322, 18322, 18322, 18353,
18353, 18353, 18353, 18353, 18383, 18383, 18383, 18383, 18383,
18414, 18414, 18414, 18414, 18414, 18444, 18444, 18444, 18444,
18444, 18475, 18475, 18475, 18475, 18475, 18506, 18506, 18506,
18506, 18506, 18536, 18536, 18536, 18536, 18536), class = "Date"),
Region = structure(c(4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L,
9L, 4L, 6L, 7L, 8L, 9L, 4L, 6L, 7L, 8L, 9L),
.Label = c("England",
"South East", "South West", "London", "East of England",
"East Midlands", "West Midlands", "Yorkshire and The Humber",
"North East", "North West"), class = "factor"),
Unemployment.rate = c(4.2102766429572,
4.68247349426148, 5.0708122696351, 5.23113585152962, 5.05625777763551,
4.45850956493638, 4.24086209425895, 5.20425572086481, 4.90649662696461,
5.58119346747183, 4.36960549219723, 4.02517515965457, 5.07463979478007,
4.74861899849302, 5.41295614949722, 4.2765275404374, 4.29397104451947,
4.95863831882363, 4.92741739593892, 5.69156027694963, 4.2650375361128,
4.23454968410189, 4.79139912788739, 5.02305883708418, 5.5878529496241,
4.54049887070026, 4.28118824655063, 4.56621383409869, 5.02948552097342,
5.34849310422496, 4.63523851140925, 4.63665149464923, 4.15610221124255,
4.28827168334814, 4.97071907922267, 4.63148007856079, 4.50379542173275,
3.98279027057451, 4.00981283870947, 5.80674097480643, 4.5449089097835,
4.46358064141772, 4.09111105457073, 3.90122545742185, 5.85180583091048,
4.50615604436695, 3.65653388653173, 4.4653881330391, 4.08974888999112,
6.11361138828401, 4.31177130663949, 3.86911315140672, 4.31748261760943,
4.34062792253313, 6.21086689536757, 4.28854311714984, 3.58533538113168,
4.43826006085208, 4.47398990035041, 6.11583334445995, 4.4614986334698,
3.93320874039025, 4.50210360585639, 4.58329815843159, 6.1811363458787,
4.4993016103369, 4.02503140646339, 4.81764323428107, 4.71840892982655,
5.61192961811575, 4.66797282030472, 3.76788548732822, 5.02382063022771,
4.27033347501753, 5.40098295976569, 4.63121679655635, 3.67161258712684,
4.80322174913054, 3.91339590231661, 5.20229523339659, 5.10845457998552,
3.97182605242641, 4.85515814694348, 3.78242013517353, 4.97115704468143,
4.6437916194869, 4.3194319371037, 4.41226516242903, 3.75797094178592,
5.16820059074221, 4.98077486925899, 4.38753537321373, 4.37107017836121,
3.98499236263049, 5.15965087736712, 5.2511686249283, 4.39271393019063,
4.62628095567074, 4.16298001615593, 6.62714213785116, 5.95104220347072,
4.89588411607636, 4.9378241924801, 4.65307341597827, 6.67088507450695,
6.33714099073375, 5.32040137455687, 5.402969264185, 5.15177120913334,
6.56889233919367)),
row.names = c(NA, -110L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:1)
我们可以将 'Date' 列 format
到 year-month
或使用 as.yearmon
中的 zoo
,将其与 'Region' 一起用作分组列,并且 {{ 1}} 'Unemployment.rate' 的 summarise
mean
或者如果它只是基于月份
library(dplyr)
library(zoo)
df1 %>%
group_by(year_mon = as.yearmon(Date), Region) %>%
summarise(Mean_unemp = mean(Unemployment.rate, na.rm = TRUE), .groups = 'drop')
如果它仅基于“区域”
df1 %>%
group_by(Month = format(Date, "%m"), Region) %>%
summarise(Mean_unemp = mean(Unemployment.rate, na.rm = TRUE), .groups = 'drop')
或按“日期”分组
df1 %>%
group_by(Region) %>%
summarise(Mean_unemp = mean(Unemployment.rate, na.rm = TRUE), .groups = 'drop')
答案 1 :(得分:1)
使用 base R
:
#Code
df1$Date <- format(df1$Date,'%b-%Y')
#Aggregate
out <- aggregate(Unemployment.rate~.,data=df1,mean,na.rm=T)
输出:
head(out)
Date Region Unemployment.rate
1 Apr-2019 London 4.276528
2 Apr-2020 London 4.631217
3 Aug-2019 London 4.631480
4 Aug-2020 London 5.251169
5 Dec-2019 London 4.288543
6 Feb-2019 London 4.458510
每月另一种选择:
#Code
df1$Date <- format(df1$Date,'%b')
#Aggregate
out <- aggregate(Unemployment.rate~.,data=df1,mean,na.rm=T)