Question

数据集具有一些聚集组的模式。这是数据集：

index <- c(1:30)
a <- c(0,1,0,0,0,1,1,1,0,0,1,1,0,0,0,0,1,1,1,1,0,0,1,0,1,1,1,0,1,0)
b <- c(1,1,1,0,0,1,1,1,0,0,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,0,1,1,1,1)
c <- c(1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
d <- c(0,0,0,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,0,1,1,1)
df <- data.frame(cbind(index, a, b, c, d))

通常，任务是在数据列（即a，b，c，d）在其中显示至少三个连续值（即1）时识别索引（即列“索引”）列及其邻居列显示至少三个连续值（即1）。

例如，以下示例的结果将输出2,3,4,7,8,9。请参见突出显示的连续值。

指数1 2 3 4 5 6 7 8 9

col-a 0 0 0 0 0 1 1 0 1

col-b 1 0 1 0 1 1 1 1 1

col-c 0 1 1 1 0 0 1 1 1

col-d 1 1 1 1 0 0 0 0 0

结果应该输出索引：1,2,3,6,7,8,12,13,14,17,18,19,27。

Answer 1

这不是最具扩展性的解决方案，但它会返回所需的结果

   ReportDataSource reportDataSource = new ReportDataSource { Name = "DataSet1", Value = cs };//cs is a list of maybe customer.

   ReportViewer1.LocalReport.DataSources.Add(reportDataSource);

这是一个可扩展为n列的版本。

# convert 1s that do not have at least runs of length 3 to 0
df[LETTERS[1:4]] <- lapply(df[-1], function(x) {
                                     tmp <- rle(x)
                                     tmp$values[tmp$lengths < 3] <- 0L
                                     inverse.rle(tmp)})

# add neighbor columns use logical subsetting to return the relevant indices
df$index[(pmax(df[[LETTERS[1]]] + df[[LETTERS[2]]],
               df[[LETTERS[2]]] + df[[LETTERS[3]]],
               df[[LETTERS[3]]] + df[[LETTERS[4]]]) > 1)]
[1]  1  2  3  6  7  8 12 13 14 17 18 19 27

与上面相同，为方便起见，仅存储在新对象中。现在，使用# convert 1s that do not have at least runs of length 3 to 0, put into list l <- lapply(df[-1], function(x) { tmp <- rle(x) tmp$values[tmp$lengths < 3] <- 0L inverse.rle(tmp)})返回相邻列的总和列表。将Map与do.call一起使用可返回元素级最大值。

pmax

Answer 2

sort(Reduce(union,
       lapply(lapply(2:(ncol(df) - 1), function(j) c(j, j + 1)), function(cols) {
           which(rowSums(sapply(df[cols], function(x)
             sapply(1:length(x), function(i) {
               sum(x[max(1, i - 1):min(i + 1, length(x))]) == 3 |
                 sum(x[max(1, i - 2):i]) == 3 |
                 sum(x[i:min(i + 2, length(x))]) == 3
             }))) > 1)
         })))
# [1]  1  2  3  6  7  8 12 13 14 17 18 19 27

如何根据收集的人口查找索引

2 个答案: