数据集具有一些聚集组的模式。这是数据集:
index <- c(1:30)
a <- c(0,1,0,0,0,1,1,1,0,0,1,1,0,0,0,0,1,1,1,1,0,0,1,0,1,1,1,0,1,0)
b <- c(1,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,0,1,1,1,1)
c <- c(1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
d <- c(0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,0,1,1,1)
df <- data.frame(cbind(index, a, b, c, d))
通常,任务是在数据列(即a,b,c,d)在其中显示至少三个连续值(即1)时识别索引(即列“索引”)列及其邻居列显示至少三个连续值(即1)。
例如,以下示例的结果将输出2,3,4,7,8,9。请参见突出显示的连续值。
指数1 2 3 4 5 6 7 8 9
col-a 0 0 0 0 0 1 1 0 1
col-b 1 0 1 0 1 1 1 1 1
col-c 0 1 1 1 0 0 1 1 1
col-d 1 1 1 1 0 0 0 0 0
结果应该输出索引:1,2,3,6,7,8,12,13,14,17,18,19,27。
答案 0 :(得分:3)
这不是最具扩展性的解决方案,但它会返回所需的结果
ReportDataSource reportDataSource = new ReportDataSource { Name = "DataSet1", Value = cs };//cs is a list of maybe customer.
ReportViewer1.LocalReport.DataSources.Add(reportDataSource);
这是一个可扩展为n列的版本。
# convert 1s that do not have at least runs of length 3 to 0
df[LETTERS[1:4]] <- lapply(df[-1], function(x) {
tmp <- rle(x)
tmp$values[tmp$lengths < 3] <- 0L
inverse.rle(tmp)})
# add neighbor columns use logical subsetting to return the relevant indices
df$index[(pmax(df[[LETTERS[1]]] + df[[LETTERS[2]]],
df[[LETTERS[2]]] + df[[LETTERS[3]]],
df[[LETTERS[3]]] + df[[LETTERS[4]]]) > 1)]
[1] 1 2 3 6 7 8 12 13 14 17 18 19 27
与上面相同,为方便起见,仅存储在新对象中。现在,使用# convert 1s that do not have at least runs of length 3 to 0, put into list
l <- lapply(df[-1], function(x) {
tmp <- rle(x)
tmp$values[tmp$lengths < 3] <- 0L
inverse.rle(tmp)})
返回相邻列的总和列表。将Map
与do.call
一起使用可返回元素级最大值。
pmax
答案 1 :(得分:2)
sort(Reduce(union,
lapply(lapply(2:(ncol(df) - 1), function(j) c(j, j + 1)), function(cols) {
which(rowSums(sapply(df[cols], function(x)
sapply(1:length(x), function(i) {
sum(x[max(1, i - 1):min(i + 1, length(x))]) == 3 |
sum(x[max(1, i - 2):i]) == 3 |
sum(x[i:min(i + 2, length(x))]) == 3
}))) > 1)
})))
# [1] 1 2 3 6 7 8 12 13 14 17 18 19 27