我正在努力进行R中的数据转换。我收到的数据属于这种类型:
input <- data.frame(AF = sample(0:1, 100, replace=TRUE),
CAD = sample(0:1, 100, replace=TRUE),
CHF = sample(0:1, 100, replace=TRUE),
DEM = sample(0:1, 100, replace=TRUE),
DIAB = sample(0:1, 100, replace=TRUE))
input$Counts <- rowSums(input)
我想要实现的输出是:
output <- data.frame(Condition = c('AF', 'CAD', 'CHF', 'DEM', 'DIAB'),
'1' = sample(11:20, 5, replace=TRUE),
'2' = sample(11:20, 5, replace=TRUE),
'3' = sample(11:20, 5, replace=TRUE),
'4' = sample(11:20, 5, replace=TRUE),
'5' = sample(11:20, 5, replace=TRUE))
交叉点是与条件匹配的观察计数(现在在第一列中)和行总和(现在是单独的列)。
我的解决方案如下,但我想知道是否有更优雅的解决方案?
data.frame(Condition = colnames(input[ ,1:5]),
"One" = c(nrow(input[input$AF==1 & input$Counts==1,]),
nrow(input[input$CAD==1 & input$Counts==1,]),
nrow(input[input$CHF==1 & input$Counts==1,]),
nrow(input[input$DEM==1 & input$Counts==1,]),
nrow(input[input$DIAB==1 & input$Counts==1,])),
"Two" = c(nrow(input[input$AF==1 & input$Counts==2,]),
nrow(input[input$CAD==1 & input$Counts==2,]),
nrow(input[input$CHF==1 & input$Counts==2,]),
nrow(input[input$DEM==1 & input$Counts==2,]),
nrow(input[input$DIAB==1 & input$Counts==2,])),
"Three" = c(nrow(input[input$AF==1 & input$Counts==3,]),
nrow(input[input$CAD==1 & input$Counts==3,]),
nrow(input[input$CHF==1 & input$Counts==3,]),
nrow(input[input$DEM==1 & input$Counts==3,]),
nrow(input[input$DIAB==1 & input$Counts==3,])),
"Four" = c(nrow(input[input$AF==1 & input$Counts==4,]),
nrow(input[input$CAD==1 & input$Counts==4,]),
nrow(input[input$CHF==1 & input$Counts==4,]),
nrow(input[input$DEM==1 & input$Counts==4,]),
nrow(input[input$DIAB==1 & input$Counts==4,])),
"Five" = c(nrow(input[input$AF==1 & input$Counts==5,]),
nrow(input[input$CAD==1 & input$Counts==5,]),
nrow(input[input$CHF==1 & input$Counts==5,]),
nrow(input[input$DEM==1 & input$Counts==5,]),
nrow(input[input$DIAB==1 & input$Counts==5,])),
"Six" = c(nrow(input[input$AF==1 & input$Counts==6,]),
nrow(input[input$CAD==1 & input$Counts==6,]),
nrow(input[input$CHF==1 & input$Counts==6,]),
nrow(input[input$DEM==1 & input$Counts==6,]),
nrow(input[input$DIAB==1 & input$Counts==6,]))
)
答案 0 :(得分:1)
也许您正在寻找func toCreatePayload() -> Payload {
let payload: [String: [String:AnyObject]] =
["saving_rule": ["description": title as AnyObject,
"amount": amount! as AnyObject,
"background_color": (backgroundColor?.toHexString())! as AnyObject,
"saving_rule_category_id": category!.remoteId as AnyObject,
"saving_rule_sub_category_id": subCategory != nil ? subCategory!.remoteId : ("" as AnyObject),
"saving_rule_condition_id": condition != nil ? condition!.remoteId : ("" as AnyObject),
"saving_rule_condition_customizations_attributes": customizations.map({$0.toCreatePayload()}) as AnyObject,
"suspended": "false"] as AnyObject
]
return payload as [String:AnyObject]
}
。
这是一个解决方案。
aggregate
myMat <- t(aggregate(.~Counts, data=input, FUN=sum)[-1,-1])
myMat
2 3 4 5 6
AF 3 10 15 15 2
CAD 2 14 16 18 2
CHF 2 14 18 16 2
DEM 4 8 16 18 2
DIAB 5 14 22 17 2
的第一个参数,aggregate
是一个公式,表示通过Counts对每列执行某些操作。第二个参数指定数据集,第三个参数指出所需操作为. ~ Counts
。使用sum
从输出中删除第一列和第一列,因为它们与所需结果无关。然后使用[-1, -1]
转置此输出。要更改列名称,您可以使用t
之类的
colnames<-
可重现的数据
colnames(myMat) <- c("One", "Two", "Three", "Four", "Five")
答案 1 :(得分:0)
您还可以使用dplyr
和tidyr
来切换长宽格式(尽管在这种特殊情况下,使用aggregate
会更容易):
library(dplyr)
library(tidyr)
# take the input dataset
input %>%
# transform to long format
gather(condition, measurement,AF:DIAB) %>%
# summarise by Counts and condition
group_by(Counts, condition) %>%
summarise(measure = sum(measurement)) %>%
# transform back to the desired wide format
spread(Counts, measure)