我在一年后回到R并想使用rpart作为分类树。
我的数据如下:
Category, Shape, Color, Yes, No
A, Square, Blue, 3, 2
B, Triangle, Blue, 2, 4
etc.
任何重塑到下面的建议所以我可以使用rpart? (我相信rpart需要这样的数据)
ID, Shape, Color, Result
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, No
A, Square, Blue, No
B, Triangle, Green, Yes
etc...
谢谢!
答案 0 :(得分:2)
您可以使用melt
中的reshape2
,然后按rep
s=melt(df,id.var=c('Category','Shape','Color'))
s[ rep( 1:nrow(s) , s$value ),]
Category Shape Color variable value
1 A Square Blue Yes 3
1.1 A Square Blue Yes 3
1.2 A Square Blue Yes 3
2 B Triangle Blue Yes 2
2.1 B Triangle Blue Yes 2
3 A Square Blue No 2
3.1 A Square Blue No 2
4 B Triangle Blue No 4
4.1 B Triangle Blue No 4
4.2 B Triangle Blue No 4
4.3 B Triangle Blue No 4
答案 1 :(得分:1)
melt
将数据转换为长格式,然后重复变量它们出现在值列中的次数。
library(data.table)
melt(setDT(dat),1:3)[,rep(variable,value),by=.(Category,Shape,Color)]
Category Shape Color V1
1: A Square Blue Yes
2: A Square Blue Yes
3: A Square Blue Yes
4: A Square Blue No
5: A Square Blue No
6: B Triangle Blue Yes
7: B Triangle Blue Yes
8: B Triangle Blue No
9: B Triangle Blue No
10: B Triangle Blue No
11: B Triangle Blue No
使用:
库(tidyverse)
dat%>%
rowwise()%>%
mutate(var=list(rep(c("Yes","No"),c(Yes,No))))%>%
select(-Yes,-No)%>%
unnest()
Category Shape Color var
<fct> <fct> <fct> <chr>
1 A Square Blue Yes
2 A Square Blue Yes
3 A Square Blue Yes
4 A Square Blue No
5 A Square Blue No
6 B Triangle Blue Yes
7 B Triangle Blue Yes
8 B Triangle Blue No
9 B Triangle Blue No
10 B Triangle Blue No
11 B Triangle Blue No