我有以下数据:
ID city gender total 0-4 5-9 10-14
1 A1 city1 M 120 30 50 40
2 A1 city1 F 100 20 45 35
3 B1 city1 F 130 35 50 45
4 B1 city1 M 150 30 60 60
5 C1 city2 M 140 40 50 50
6 C1 city2 F 135 35 45 55
7 D1 city2 M 145 40 55 50
8 D1 city2 F 165 65 35 65
9 E1 city2 M 155 50 50 55
10 E1 city2 F 160 50 45 65
这里0-4岁,5-9岁,10-14岁是年龄组。
如何以下列形式轻松更改此数据:
ID city total male female meanage male%
1 A1 city1 220 120 100 7.57 54
2 B1 city1 280 150 130 8.5 53
3 C1 city2 --- --- --- ---- ----
4 D1 city2 -- --- --- --- -----
5 E1 city2 -- ---- ---- ----- -------
感谢。
答案 0 :(得分:0)
你真的想做很多复杂的转换。我看不到一步一步完成这一切的简单方法。首先,这里的样本数据来自
dd<-data.frame(
ID = paste0(rep(LETTERS[1:5],each=2),"1"),
city = rep(c("city1", "city2"), c(4,6)),
gender = c("M", "F", "F", "M", "M", "F", "M", "F", "M", "F"),
total = c(120, 100, 130, 150, 140, 135, 145, 165, 155, 160),
X0.4 = c(30, 20, 35, 30, 40, 35, 40, 65, 50, 50),
X5.9 = c(50, 45, 50, 60, 50, 45, 55, 35, 50, 45),
X10.14 = c(40, 35, 45, 60, 50, 55, 50, 65, 55, 65)
)
现在我们需要在开始重塑数据之前计算总加权年龄
tt<-transform(dd,
wage=2*X0.4 + 7*X5.9 + 12*X10.14,
X0.4=NULL,
X5.9=NULL,
X10.14=NULL
)
现在我们将重塑数据。它还不够宽,而且#34;因此,我们将融化它,并扩大它,以便我们为每个性别的总和和工资列。
library(reshape2)
mm<-melt(tt, id.vars=c("ID","city","gender"))
cc<-dcast(mm, ID+city~gender+variable)
现在我们拥有每行所需的所有数据,因此我们进行最后一组转换以获取我们想要的列
transform(cc, total=F_total+M_total,
male=M_total,
female=F_total,
meanage = (F_wage+M_wage)/(M_total+F_total),
maleperc = M_total/(M_total+F_total) * 100,
F_total=NULL, M_total=NULL,
F_wage=NULL, M_wage=NULL
)
哪个产生
ID city total male female meanage maleperc
1 A1 city1 220 120 100 7.568182 54.54545
2 B1 city1 280 150 130 7.714286 53.57143
3 C1 city2 275 140 135 7.545455 50.90909
4 D1 city2 310 145 165 7.161290 46.77419
5 E1 city2 315 155 160 7.317460 49.20635
所以我不确定我同意你的平均年龄B1。如果您正在使用其他公式计算该值,请随时更新代码。
答案 1 :(得分:0)
这或多或少会对数据的前几行产生影响。 (我认为你对B1的meanage
的计算是关闭的,但是根据年龄范围计算平均值对我来说似乎并不合适。)
library('dplyr')
library('reshape2')
yourdata <- data.frame(ID = c("A1", "A1", "B1", "B1", "C1", "C1"),
city = c(rep("city1", 4), "city2", "city2"),
gender = c("M", "F", "F", "M", "M", "F"),
total = c(120, 100, 130, 150, 140, 135),
X0_4 = c(30, 20, 35, 30, 40, 35),
X5_9 = c(50, 45, 50, 60, 50, 45),
X10_14 = c(40, 35, 45, 60, 50, 55))
# total counts of each age group for each ID
age_counts <- yourdata %>%
group_by(ID) %>%
summarize(X0_4 = sum(X0_4), X5_9 = sum(X5_9), X10_14 = sum(X10_14))
# use dcast to get 'male' and 'female' columns
data_wide <- dcast(yourdata, ID + city ~ gender, value.var = 'total')
# merge back with age counts
data_full <- merge(data_wide, age_counts, by = 'ID')
# final table
data_full %>%
group_by(ID) %>%
mutate(total = sum(F, M),
meanage = ((X0_4 * ((0 + 4) / 2) + X5_9 * ((5 + 9) / 2) + X10_14 *
((10 + 14) / 2)) / total),
male_percent = ((M * 100) %/% total)) %>%
select(ID, city, total, M, F, meanage, male_percent)