将二进制列转换为R

时间:2018-03-13 13:23:54

标签: r multiple-columns

我是R的新手,到目前为止只有基本技能,即使我检查了melt()gather()这样的功能,它们也不适合我。 我想要做的是转换这些数据(考虑到HAS House / Renting和Homeless的所有选项只有10,你不能超过1(你不能租房)和无家可归者同时)

例如

Passenger ID ///  Has Own House /// Renting /// Homeless /// Age /// Gender
1                  1                 0           0            21      Male
2                  0                 1           0            24     Female

我希望这些数据看起来像这样:

Passenger ID /// Housing       /// Age /// Gender
1                Has own house     21      Male
2                Renting           24      Female

当谈到预测时 - 请告诉您上述方法(使用二元因子)在速度方面是否会更好,或者在1列中都是更好的解决方案?

3 个答案:

答案 0 :(得分:1)

试试这个

library(tidyverse)
# importing your data
df <- read_table("Passenger_ID    Has_Own_House   Renting   Homeless   Age Gender
1                  1                 0           0     21      Male
2                  0                 1           0     24     Female")

并运行:

df %>% 
  gather(Housing, value, -Passenger_ID, -Age, -Gender) %>% 
  filter(value==1) %>% 
  select(-value)

输出是:

# A tibble: 2 x 4
#   Passenger_ID   Age Gender Housing      
#          <int> <int> <chr>  <chr>        
# 1            1    21 Male   Has_Own_House
# 2            2    24 Female Renting   

答案 1 :(得分:0)

在基础R中使用ifelse:

# Load Data
dat <- structure(list(Passenger_ID = 1:2, Has_Own_House = c(1L, 0L),
         Renting = 0:1, Homeless = c(0L, 0L), Age = c(21L, 24L), Gender = structure(c(2L,
         1L), .Label = c("Female", "Male"), class = "factor")), .Names = c("Passenger_ID",
         "Has_Own_House", "Renting", "Homeless", "Age", "Gender"), class = "data.frame", row.names = c(NA,
         -2L))

# Assign new column "Housing" based on testing nested ifelse statements:

dat2 <- within(dat, Housing <- ifelse(Has_Own_House==1, "Has_Own_House",
                           ifelse(Renting==1, "Renting",
                           ifelse(Homeless==1, "Homeless", NA))))

# Remove extra columns
dat2$Has_Own_House <- NULL
dat2$Renting <- NULL
dat2$Homeless <- NULL

屈服

>dat2

Passenger_ID Age Gender       Housing
       1  21   Male Has_Own_House
       2  24 Female       Renting

答案 2 :(得分:0)

在基础R中,您只需通过apply数据框的所有行(1参数)在一行中分配一个新列,该函数返回相应的列名称(其中由于which),值为1:

df = data.frame('Passenger ID' = 1:5,
                'Has Own House' = c(1,0,0,1,0),
                'Renting' = c(0,1,0,0,0),
                'Homeless' = c(0,0,1,0,1),
                'Age'=21:25,
                'Gender' = c('Male', 'Female', 'Male', 'Female', 'Male'))


df$HOUSING = apply(df[, 2:4], 1, function(x) names(df)[2:4][which(x==1)])
df
#   Passenger.ID Has.Own.House Renting Homeless Age Gender       HOUSING
# 1            1             1       0        0  21   Male Has.Own.House
# 2            2             0       1        0  22 Female       Renting
# 3            3             0       0        1  23   Male      Homeless
# 4            4             1       0        0  24 Female Has.Own.House
# 5            5             0       0        1  25   Male      Homeless