想要使用kapacitor在单个流上使用多个窗口

时间:2016-09-20 15:09:47

标签: influxdb kapacitor

目标:我希望在提醒状态后5分钟发出通知,之后每30分钟发出一次通知。

我使用了.count()time功能,但这无处可去,我不想计算那个烂摊子,我无法找到方法使用户友好和可靠。

我现在使用的解决方案是让两个流具有单独的窗口。

var initialData = stream
        |from()
            .database(db)
            .measurement(metricType)
            .retentionPolicy(rPolicy)
            .groupBy(group)
            .where(lambda: "cpu" == 'cpu-total')
            .where(lambda: "host" =~ hostFilter)
        |mean(metric)
            .as('initialStat')
        |window()
            .period(10m)
            .every(5m)
            .align()

var continuousData = stream
    |from()
        .database(db)
        .measurement(metricType)
        .retentionPolicy(rPolicy)
        .groupBy(group)
        .where(lambda: metricType == 'cpu-total')
        .where(lambda: "host" =~ hostFilter)
    |mean(metric)
        .as('continuousStat')
    |window()
        .period(10m)
        .every(30)
        .align()

除了这个看似奇怪的事实之外,我还需要计算每个值的值,我还需要单独的|alert()个节点。第一个节点只会通知状态变化,但第二个节点不能这样做,所以我每N分钟收到一次警报提醒。我还遇到的问题是,第一个|alert()节点会发出OK通知,第二个节点也会在N分钟后发送一个OK。{/ p>

我觉得必须有更好的方法来做到这一点。我想我可以在第二个if节点中使用|alert()语句,而不会在OK上发送通知,因为第一个|window会处理该通知。在这一点上,我还没有想出如何做到这一点,但我相信这是可能的。我也不想打tickscript,我知道它不是一个完整的语言,Issue 741

完整的滴答记录在

之下
// CONFIGURATION PARAMETERS

// Alerting

var emailAddress = '$EMAIL'
var pagerdutyKey = '$PD'
var slackChannel = '$SLACK'

// Static Thresholds in percent cpu steal used
var warn = 85
var crit = 95

// Dynamic thresholds in number of std deviations
var warnSig = 2.5
var critSig = 3.5

// Print INFO level (every result will be an alert)
// AlertNode.StateChangesOnly will also need to be disabled
// NOTE:
// INFO level alerts will be disregarded by the pagerduty handler, this is not configurable.
var debug = FALSE

// Datastream
// Define the data that will be acted upon
var db           = 'telegraf'
var group        = 'host'
var metricType   = 'cpu'
var metric       = 'time_steal'
var rPolicy      = 'default'

// Regex used to filter on a subset of hosts
var hostFilter = /.+/

// Window
var dataPeriod            = 10m
var initialFrequency      = 5m
var continuousFrequency   = 30m

// DATAFRAME
var initialData = stream
    |from()
        .database(db)
        .measurement(metricType)
        .retentionPolicy(rPolicy)
        .groupBy(group)
        .where(lambda: metricType == 'cpu-total')
        .where(lambda: "host" =~ hostFilter)
    |mean(metric)
        .as('initialStat')
    |window()
        .period(dataPeriod)
        .every(initialFrequency)
        .align()

var continuousData = stream
    |from()
        .database(db)
        .measurement(metricType)
        .retentionPolicy(rPolicy)
        .groupBy(group)
        .where(lambda: metricType == 'cpu-total')
        .where(lambda: "host" =~ hostFilter)
    |mean(metric)
        .as('continuousStat')
    |window()
        .period(dataPeriod)
        .every(continuousFrequency)
        .align()

// Calculations
var initialCalculation = initialData
    |eval(lambda: sigma("initialStat"))
        .as('intialSigma')
        .keep()

var continuousCalculation = continuousData
    |eval(lambda: sigma("continuousStat"))
        .as('continuousSigma')
        .keep()

// ALERT CONDITIONS
var initialCondition = initialCalculation
    |alert()
        .id('{{ index .Tags "host"  }}')
        .message('{{ .ID  }} is {{ .Level  }}: CPU STEAL USAGE {{ index .Fields "initialStat" }}% SHORT')
        .details('this is an alert')
        .stateChangesOnly()
        .info(lambda: debug)
        .warn(lambda: "stat" < warn OR
            "sigma" > warnSig)
        .crit(lambda: "stat" < crit OR
            "sigma" > critSig)

var continuousCondition = continuousCalculation
    |alert()
        .id('{{ index .Tags "host"  }}')
        .message('{{ .ID  }} is {{ .Level  }}: CPU STEAL USAGE {{ index .Fields "continuousStat" }}% LONG')
        .details('this is an alert')
        .info(lambda: debug)
        .warn(lambda: "stat" < warn OR
            "sigma" > warnSig)
        .crit(lambda: "stat" < crit OR
            "sigma" > critSig)

// ACTIONS
continuousCondition
        // .log('/tmp/alerts/cpu_steal_usage_alerts')
        // .slack()
        // .channel(slackChannel)
        .email(emailAddress)
        .pagerDuty()
                .serviceKey(pagerdutyKey)

initialCondition
        // .log('/tmp/alerts/cpu_steal_usage_alerts')
        // .slack()
        // .channel(slackChannel)
        .email(emailAddress)
        .pagerDuty()
                .serviceKey(pagerdutyKey)

1 个答案:

答案 0 :(得分:1)

显然,我可以在单个流节点中执行多个窗口。

stream
    |from()
        .database(db)
        .measurement(metricType)
        .retentionPolicy(rPolicy)
        .groupBy(group)
        .where(lambda: metricType == metricFilter)
        .where(lambda: "host" =~ hostFilter)
    |window()
        .period(dataPeriod)
        .every(initialFrequency)
        .align()
    |mean(metric)
        .as('initialStat')
    |window()
        .period(dataPeriod)
        .every(continuousFrequency)
        .align()
    |mean(metric)
        .as('continuousStat')

尽管仍在处理OK问题。