我想将数据帧A
转换为数据帧B
A = data.frame(male = c(3, 5), female = c(1,2))
B = data.frame(male = c(1,1,1,1,1,1,1,1,0,0,0), female = c(0,0,0,0,0,0,0,0,1,1,1))
我有这种方法
new <- data.frame(male = c(rep(1, sum(male)), rep(0, sum(female))), female = c(rep(0, sum(male)), rep(1, sum(female))))
这给了我想要的数据帧。
但是,由于我的原始数据帧(A)比示例更复杂,有更好的方法吗?
更新
数据帧可能以诸如
的方式更复杂A = data.frame(month = c("July", "August"), male = c(5, 3), female = c(2,1))
转换为
data.frame(month = c(rep("July", 5), rep("July", 2), rep("Aug", 3), rep("Aug", 1)),
male = c(rep(1, 5), rep(0, 2), rep(1, 3), rep(0, 1)),
female = c(rep(0, 5), rep(1, 2), rep(0, 3), rep(1, 1)))
# month male female
#1 July 1 0
#2 July 1 0
#3 July 1 0
#4 July 1 0
#5 July 1 0
#6 July 0 1
#7 July 0 1
#8 August 1 0
#9 August 1 0
#10 August 1 0
#11 August 0 1
谢谢。
答案 0 :(得分:2)
我们可以在tidyverse
中进行此操作。 gather
将数据转换为“长”格式,然后通过uncount
设置“ val”列来扩展行,创建一个1s列,按“ month”分组,创建一个序列列(“ ind” ),spread
从“长”到“宽”
library(tidyverse)
gather(A, sex, val, -month) %>%
uncount(val) %>%
mutate(val = 1) %>%
group_by(month = factor(month, levels = month.name)) %>%
mutate(ind = row_number()) %>%
spread(sex, val, fill = 0) %>%
select(month, male, female)
# A tibble: 11 x 3
# Groups: month [2]
# month male female
# <fct> <dbl> <dbl>
# 1 July 1 0
# 2 July 1 0
# 3 July 1 0
# 4 July 1 0
# 5 July 1 0
# 6 July 0 1
# 7 July 0 1
# 8 August 1 0
# 9 August 1 0
#10 August 1 0
#11 August 0 1
或对data.table
使用类似的逻辑
library(data.table)
dcast(melt(setDT(A), id.var = 'month')[, rep(1, value),
.(month, variable)], month + rowid(month) ~ variable,
value.var = 'V1', fill = 0)[, month_1 := NULL][]
A <- data.frame(month = c("July", "August"), male = c(5, 3), female = c(2,1))
答案 1 :(得分:1)
您可以使用inverse.rle
:
male<-c(1,0)
female<-c(0,1)
inverse.rle(list(lengths=sapply(A,sum),values=male))
[1] 1 1 1 1 1 1 1 1 0 0 0
inverse.rle(list(lengths=sapply(A,sum),values=female))
[1] 0 0 0 0 0 0 0 0 1 1 1
现在让我们将此方法应用于您的复杂数据:
split(A,A$month) %>% # split the data by months
lapply(function(x) data.frame(month=x[,1], # take each month's data, and create a data.frame for it with a month column, and the male and female columns with zeros and ones
male=inverse.rle(list(lengths=sapply(x[,2:3],sum),values=c(1,0))), # if the data is very big, you might want to do they sapply here outside of this lapply, but I doubt this would make a big difference
female=inverse.rle(list(lengths=sapply(x[,2:3],sum),values=c(0,1))))) %>%
do.call(dplyr::bind_rows, .) %>% # use do.call to take the list we created and bind it. I'm using dplyr's bind.rows because rbind formats the rows poorly.
arrange(sapply(test$month, function(x) which(x==month.name))) # the rows come out sorted by alphabetical order of months, so this fixes that.
结果:
month male female
1 July 1 0
2 July 1 0
3 July 1 0
4 July 1 0
5 July 1 0
6 July 0 1
7 July 0 1
8 August 1 0
9 August 1 0
10 August 1 0
11 August 0 1
答案 2 :(得分:1)
不确定是否有处理此问题的软件包,但是使用base R,我们可以使用apply
do.call(rbind, apply(A, 1, function(x) {
y <- as.numeric(x[-1])
data.frame(month = rep(x[1], sum(y)), male = rep(c(1, 0), c(y[1], y[2])),
female = rep(c(0, 1), c(y[1], y[2]))) #Thanks @iod for simplifying
}))
# month male female
#1 July 1 0
#2 July 1 0
#3 July 1 0
#4 July 1 0
#5 July 1 0
#6 July 0 1
#7 July 0 1
#8 August 1 0
#9 August 1 0
#10 August 1 0
#11 August 0 1
在这里,我们为每一行创建一个数据框,其中第一列是月份。我们从“男性”栏中计算男性的1个数字,并从总和中减去0个数字-男性的数量反之亦然。