在R中将一个数据帧转换为另一个具有0和1的数据帧

时间:2018-12-28 00:52:04

标签: r dataframe

我想将数据帧A转换为数据帧B

A = data.frame(male = c(3, 5), female = c(1,2))

B = data.frame(male = c(1,1,1,1,1,1,1,1,0,0,0), female = c(0,0,0,0,0,0,0,0,1,1,1))

我有这种方法

new <- data.frame(male = c(rep(1, sum(male)), rep(0, sum(female))), female = c(rep(0, sum(male)), rep(1, sum(female))))

这给了我想要的数据帧。

但是,由于我的原始数据帧(A)比示例更复杂,有更好的方法吗?

更新

数据帧可能以诸如

的方式更复杂
A = data.frame(month = c("July", "August"), male = c(5, 3), female = c(2,1))

转换为

data.frame(month = c(rep("July", 5), rep("July", 2), rep("Aug", 3), rep("Aug", 1)),
       male = c(rep(1, 5), rep(0, 2), rep(1, 3), rep(0, 1)),
       female = c(rep(0, 5), rep(1, 2), rep(0, 3), rep(1, 1)))

#    month male female
#1    July    1      0
#2    July    1      0
#3    July    1      0
#4    July    1      0
#5    July    1      0
#6    July    0      1
#7    July    0      1
#8  August    1      0
#9  August    1      0
#10 August    1      0
#11 August    0      1

谢谢。

3 个答案:

答案 0 :(得分:2)

我们可以在tidyverse中进行此操作。 gather将数据转换为“长”格式,然后通过uncount设置“ val”列来扩展行,创建一个1s列,按“ month”分组,创建一个序列列(“ ind” ),spread从“长”到“宽”

library(tidyverse)
gather(A, sex, val, -month) %>%
    uncount(val) %>% 
    mutate(val = 1) %>%
    group_by(month = factor(month, levels = month.name)) %>% 
    mutate(ind = row_number()) %>%
    spread(sex, val, fill = 0) %>%
    select(month, male, female)
# A tibble: 11 x 3
# Groups:   month [2]
#   month   male female
#   <fct>  <dbl>  <dbl>
# 1 July       1      0
# 2 July       1      0
# 3 July       1      0
# 4 July       1      0
# 5 July       1      0
# 6 July       0      1
# 7 July       0      1
# 8 August     1      0
# 9 August     1      0
#10 August     1      0
#11 August     0      1

或对data.table使用类似的逻辑

library(data.table)
dcast(melt(setDT(A), id.var = 'month')[, rep(1, value), 
 .(month, variable)], month + rowid(month) ~ variable, 
    value.var = 'V1', fill = 0)[, month_1 := NULL][]

数据

A <- data.frame(month = c("July", "August"), male = c(5, 3), female = c(2,1))

答案 1 :(得分:1)

您可以使用inverse.rle

male<-c(1,0)
female<-c(0,1)
inverse.rle(list(lengths=sapply(A,sum),values=male))
 [1] 1 1 1 1 1 1 1 1 0 0 0
inverse.rle(list(lengths=sapply(A,sum),values=female))
 [1] 0 0 0 0 0 0 0 0 1 1 1

现在让我们将此方法应用于您的复杂数据:

split(A,A$month) %>% # split the data by months
lapply(function(x) data.frame(month=x[,1], # take each month's data, and create a data.frame for it with a month column, and the male and female columns with zeros and ones
  male=inverse.rle(list(lengths=sapply(x[,2:3],sum),values=c(1,0))), # if the data is very big, you might want to do they sapply here outside of this lapply, but I doubt this would make a big difference
  female=inverse.rle(list(lengths=sapply(x[,2:3],sum),values=c(0,1))))) %>%
do.call(dplyr::bind_rows, .) %>% # use do.call to take the list we created and bind it. I'm using dplyr's bind.rows because rbind formats the rows poorly.
arrange(sapply(test$month, function(x) which(x==month.name))) # the rows come out sorted by alphabetical order of months, so this fixes that.

结果:

    month male female
1    July    1      0
2    July    1      0
3    July    1      0
4    July    1      0
5    July    1      0
6    July    0      1
7    July    0      1
8  August    1      0
9  August    1      0
10 August    1      0
11 August    0      1

答案 2 :(得分:1)

不确定是否有处理此问题的软件包,但是使用base R,我们可以使用apply

do.call(rbind, apply(A, 1, function(x) {
   y <- as.numeric(x[-1])
  data.frame(month = rep(x[1], sum(y)), male = rep(c(1, 0), c(y[1], y[2])), 
             female = rep(c(0, 1), c(y[1], y[2]))) #Thanks @iod for simplifying
})) 


#    month male female
#1    July    1      0
#2    July    1      0
#3    July    1      0
#4    July    1      0
#5    July    1      0
#6    July    0      1
#7    July    0      1
#8  August    1      0
#9  August    1      0
#10 August    1      0
#11 August    0      1

在这里,我们为每一行创建一个数据框,其中第一列是月份。我们从“男性”栏中计算男性的1个数字,并从总和中减去0个数字-男性的数量反之亦然。