我有一个数据帧(表),其中包括分类变量(Fert)的2个级别(F,I)的频率计数(Freq)。
表[1:10]
FemID Sperm Week Fert Freq
1: 269 High 1 F 4
2: 269 High 1 I 5
3: 273 High 1 F 6
4: 274 High 1 I 1
5: 275 High 1 I 1
6: 276 High 1 I 1
7: 278 Low 1 I 1
8: 280 Low 1 I 1
9: 281 Low 1 I 1
10: 282 Low 1 I 5
我想将此转换为数据帧,其中Fert(I和F)的两个级别是FemID的每个值的单独变量,0表示缺少一个级别的计数,如下所示:
FemID Sperm Week Fert Infert
1: 269 High 1 4 5
2: 273 High 1 6 0
3: 274 High 1 1 0
4: 275 High 1 1 0
5: 276 High 1 1 0
想法或建议?我觉得需要一个循环,但我不确定如何为此设置它。也许有两个部分,一个创建两个新变量,一个填充0?
答案 0 :(得分:0)
您可以在spread
中使用tidyr
:
> library(tidyr)
> df %>% spread(Fert,Freq)
FemID Sperm Week F I
1 269 High 1 4 5
2 273 High 1 6 NA
3 274 High 1 NA 1
4 275 High 1 NA 1
5 276 High 1 NA 1
6 278 Low 1 NA 1
7 280 Low 1 NA 1
8 281 Low 1 NA 1
9 282 Low 1 NA 5
您还可以调整变量名称:
> df %>% spread(Fert,Freq) %>%
setNames(c("FemID","Sperm","Week","Fert","Infert"))
FemID Sperm Week Fert Infert
1 269 High 1 4 5
2 273 High 1 6 NA
3 274 High 1 NA 1
4 275 High 1 NA 1
.... the rest is truncated
可以按NAs过滤:
> df %>% spread(Fert,Freq) %>%
setNames(c("FemID","Sperm","Week","Fert","Infert")) %>%
filter(!is.na(Fert))
FemID Sperm Week Fert Infert
1 269 High 1 4 5
2 273 High 1 6 NA
答案 1 :(得分:0)
由于您的数据位于data.table
,因此dcast
是一个不错的选择:
library(data.table)
setDT(df)
dcast(df, FemID+Sperm+Week~Fert, value.var = "Freq")
#OR A shorter way could be as
dcast(df, ...~Fert, value.var = "Freq")
# FemID Sperm Week F I
# 1: 269 High 1 4 5
# 2: 273 High 1 6 NA
# 3: 274 High 1 NA 1
# 4: 275 High 1 NA 1
# 5: 276 High 1 NA 1
# 6: 278 Low 1 NA 1
# 7: 280 Low 1 NA 1
# 8: 281 Low 1 NA 1
# 9: 282 Low 1 NA 5
数据强>
df <- read.table(text = "FemID Sperm Week Fert Freq
1: 269 High 1 F 4
2: 269 High 1 I 5
3: 273 High 1 F 6
4: 274 High 1 I 1
5: 275 High 1 I 1
6: 276 High 1 I 1
7: 278 Low 1 I 1
8: 280 Low 1 I 1
9: 281 Low 1 I 1
10: 282 Low 1 I 5", header = TRUE, stringsAsFactors = FALSE)