将级别计数转换为单独的变量,包括缺失计数的0

时间:2018-03-21 18:39:47

标签: r dataframe levels

我有一个数据帧(表),其中包括分类变量(Fert)的2个级别(F,I)的频率计数(Freq)。

  

表[1:10]

    FemID Sperm  Week Fert Freq
1:   269  High    1    F    4
2:   269  High    1    I    5
3:   273  High    1    F    6
4:   274  High    1    I    1
5:   275  High    1    I    1
6:   276  High    1    I    1
7:   278   Low    1    I    1
8:   280   Low    1    I    1
9:   281   Low    1    I    1
10:   282   Low    1    I    5

我想将此转换为数据帧,其中Fert(I和F)的两个级别是FemID的每个值的单独变量,0表示缺少一个级别的计数,如下所示:

    FemID Sperm  Week Fert Infert
1:   269  High    1    4    5
2:   273  High    1    6    0
3:   274  High    1    1    0
4:   275  High    1    1    0
5:   276  High    1    1    0

想法或建议?我觉得需要一个循环,但我不确定如何为此设置它。也许有两个部分,一个创建两个新变量,一个填充0?

2 个答案:

答案 0 :(得分:0)

您可以在spread中使用tidyr

> library(tidyr)
> df %>% spread(Fert,Freq)
  FemID Sperm Week  F  I
1   269  High    1  4  5
2   273  High    1  6 NA
3   274  High    1 NA  1
4   275  High    1 NA  1
5   276  High    1 NA  1
6   278   Low    1 NA  1
7   280   Low    1 NA  1
8   281   Low    1 NA  1
9   282   Low    1 NA  5

您还可以调整变量名称:

> df %>% spread(Fert,Freq) %>% 
      setNames(c("FemID","Sperm","Week","Fert","Infert"))
  FemID Sperm Week Fert Infert
1   269  High    1    4      5
2   273  High    1    6     NA
3   274  High    1   NA      1
4   275  High    1   NA      1
.... the rest is truncated

可以按NAs过滤:

> df %>% spread(Fert,Freq) %>% 
    setNames(c("FemID","Sperm","Week","Fert","Infert")) %>% 
    filter(!is.na(Fert))
  FemID Sperm Week Fert Infert
1   269  High    1    4      5
2   273  High    1    6     NA

答案 1 :(得分:0)

由于您的数据位于data.table,因此dcast是一个不错的选择:

  library(data.table)
  setDT(df)
  dcast(df, FemID+Sperm+Week~Fert, value.var = "Freq") 

  #OR A shorter way could be as

  dcast(df, ...~Fert, value.var = "Freq")


  #    FemID  Sperm  Week  F  I
  # 1:   269  High    1  4  5
  # 2:   273  High    1  6 NA
  # 3:   274  High    1 NA  1
  # 4:   275  High    1 NA  1
  # 5:   276  High    1 NA  1
  # 6:   278   Low    1 NA  1
  # 7:   280   Low    1 NA  1
  # 8:   281   Low    1 NA  1
  # 9:   282   Low    1 NA  5

数据

  df <- read.table(text = "FemID Sperm  Week Fert Freq
  1:   269  High    1    F    4
  2:   269  High    1    I    5
  3:   273  High    1    F    6
  4:   274  High    1    I    1
  5:   275  High    1    I    1
  6:   276  High    1    I    1
  7:   278   Low    1    I    1
  8:   280   Low    1    I    1
  9:   281   Low    1    I    1
  10:   282   Low    1    I    5", header = TRUE, stringsAsFactors = FALSE)