根据R中的分类预测值分离数据框

时间:2018-04-29 20:36:04

标签: r dataframe

我有一个数据框,称之为d,包含一个连续变量和两个分类(0/1)变量。

这是一个例子

structure(list(s = c(35.33, 39.51, 42.35, 42.35, 43.62, 43.77, 44.28, 44.32,44.74, 44.81, 47.71, 48.05, 48.13, 48.75, 49.4,49.44, 49.98, 50.27, 50.33, 50.54, 50.97, 51.2, 51.67, 51.94, 52.05, 52.7, 52.74, 52.82, 52.92, 54.17, 54.38, 54.57, 54.71, 55.53, 55.71, 56.11, 56.24, 56.29, 56.53, 57.16, 57.53, 58.04, 58.6, 58.8, 59.01, 59.26, 59.48, 59.61, 59.98, 60.54, 60.85, 61.89,62.01, 62.8, 63.22, 63.38, 63.78, 63.95, 67.08, 67.24, 67.54, 68.69, 70.16, 70.59, 72.15, 72.87, 76.69), age = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L), .Label = c(">=30", "<30"), class = "factor"), sex = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L), .Label = c("Men", "Women"), class = "factor")), .Names = c("s", "age", "sex"), row.names = c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L,15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L), class = "data.frame")

我想创建包含相同连续变量的4个数据框,每个可能的分类变量组合一个:00,01,10,11。如何在R中执行此操作?

1 个答案:

答案 0 :(得分:1)

您可以使用split()功能执行此操作:

# Create a list holding the four dataframes
list.of.dfs <- split(df, paste(df$age, df$sex, sep="_"))

# check the result
lapply(list.of.dfs, head)
#> $`<30_Men`
#>        s age sex
#> 10 44.74 <30 Men
#> 12 47.71 <30 Men
#> 16 49.40 <30 Men
#> 18 49.98 <30 Men
#> 19 50.27 <30 Men
#> 20 50.33 <30 Men
#> 
#> $`<30_Women`
#>        s age   sex
#> 4  42.35 <30 Women
#> 5  42.35 <30 Women
#> 6  43.62 <30 Women
#> 7  43.77 <30 Women
#> 8  44.28 <30 Women
#> 11 44.81 <30 Women
#> 
#> $`>=30_Men`
#>        s  age sex
#> 15 48.75 >=30 Men
#> 25 51.94 >=30 Men
#> 27 52.70 >=30 Men
#> 37 56.11 >=30 Men
#> 38 56.24 >=30 Men
#> 40 56.53 >=30 Men
#> 
#> $`>=30_Women`
#>        s  age   sex
#> 1  35.33 >=30 Women
#> 3  39.51 >=30 Women
#> 9  44.32 >=30 Women
#> 14 48.13 >=30 Women
#> 21 50.54 >=30 Women
#> 30 52.92 >=30 Women