我正在尝试查找R中匹配的实例的平均数量。
我想知道何时所有3列== 1,何时所有3列== 0,最后是两者。
这不起作用:
mean(test$direction == test$pred.lm == test$pred.svm)
这给了我direction
列等于pred.lm
列的实例总数。
mean(test$direction == test$pred.lm)
示例:
direction pred.lm pred.svm
2018-07-20 0 0 0
2018-07-23 1 0 0
2018-07-24 0 0 1
2018-07-25 1 1 1
2018-07-26 1 1 1
2018-07-27 0 0 0
第1行,第4行,第5行和第6行都匹配。我想要它们在== 0且== 1时匹配的平均次数,最后是所有匹配,无论0或1。
数据:
library(xts)
df <- structure(c(0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0,
1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1,
1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1), index = structure(c(1532044800,
1532304000, 1532390400, 1532476800, 1532563200, 1532649600, 1532908800,
1532995200, 1533081600, 1533168000, 1533254400, 1533513600, 1533600000,
1533686400, 1533772800, 1533859200, 1534118400, 1534204800, 1534291200,
1534377600, 1534464000, 1534723200, 1534809600, 1534896000, 1534982400,
1535068800, 1535328000, 1535414400, 1535500800, 1535587200, 1535673600,
1536019200, 1536105600, 1536192000, 1536278400, 1536537600, 1536624000,
1536710400, 1536796800, 1536883200, 1537142400, 1537228800, 1537315200,
1537401600, 1537488000, 1537747200, 1537833600, 1537920000, 1538006400,
1538092800, 1538352000, 1538438400, 1538524800, 1538611200, 1538697600,
1538956800, 1539043200, 1539129600, 1539216000, 1539302400, 1539561600,
1539648000, 1539734400, 1539820800, 1539907200, 1540166400, 1540252800,
1540339200, 1540425600, 1540512000, 1540771200, 1540857600, 1540944000,
1541030400, 1541116800, 1541376000, 1541462400, 1541548800, 1541635200,
1541721600, 1541980800, 1542067200, 1542153600, 1542240000, 1542326400,
1542585600, 1542672000, 1542758400, 1542931200, 1543190400, 1543276800,
1543363200, 1543449600, 1543536000), tzone = "UTC", tclass = "Date"), class = c("xts",
"zoo"), .indexCLASS = "Date", .indexTZ = "UTC", tclass = "Date", tzone = "UTC", src = "yahoo", updated = structure(1544903554.77594, class = c("POSIXct",
"POSIXt")), .Dim = c(94L, 3L), .Dimnames = list(NULL, c("direction",
"pred.lm", "pred.svm")))
答案 0 :(得分:3)
你很近;主要问题是所有比较都必须是二进制的,如:
# All 1's
with(df, mean(direction == pred.lm & pred.lm == pred.svm & pred.svm == 1))
# [1] 0.1276596
# All 0's
with(df, mean(direction == pred.lm & pred.lm == pred.svm & pred.svm == 0))
# [1] 0.393617
# All equal
with(df, mean(direction == pred.lm & pred.lm == pred.svm))
# [1] 0.5212766
但是,您可能会做得更好:
# All 1's
mean(rowSums(df) == ncol(df))
# [1] 0.1276596
# All 0's
mean(rowSums(df) == 0)
# [1] 0.393617
# All equal
mean(rowSums(df) %in% c(0, ncol(df)))
# [1] 0.5212766
这不仅更短,而且在df
中允许多于三列。
答案 1 :(得分:1)
我从您的问题中得到的是您想要实例为0,0,0或1,1,1的所有行的均值
> rowno.<-NULL
> for(i in 1:nrow(df))
+ if(all(df[i,]==0) || all(df[i,]==1))
+ rowno.<-c(rowno.,i)
> rowno.
[1] 1 4 5 6 7 8 9 11 12 17 19 20 21 23 25 31 32 33
[19] 34 36 38 39 45 48 51 53 54 55 56 57 58 65 68 69 71 73
[37] 74 75 77 78 80 82 83 86 89 91 92 93 94
> mean(rowno.)
[1] 47.91837
如果您想获得所有观察结果的均值,则可以通过
> mean(df[rowno.,])
[1] 0.244898