标量火花获取每个时间间隔的平均值

时间:2018-10-16 09:40:51

标签: scala apache-spark apache-spark-sql

我的输入是一个Spark数据框:

EventTime,Signal
0,-65
10,-63
20,-71
40,-65
50,-62
80,-81
90,-84
100,-81
...
85460,-71
85480,-66
85490,-89
85500,-80

我想获取EventTime每900秒的Signal平均值,输出如下所示:

EventTime, MeanSignal
0, mean 
900, mean 
1800, mean
...
85500, mean

我的问题是常规数据中没有EventTime的常规步骤,因此我无法将数据帧分割为相同长度的部分...

2 个答案:

答案 0 :(得分:0)

您可以将新列添加为EventTime / 900并根据该列进行分组。像这样的东西。

{
  "errors": [
    {
      "code": 32,
      "message": "Could not authenticate you."
    }
  ]
}

结果看起来像这样。 EventTime 0表示介于0-899之间的值,依此类推。

const setAccessToken = async (ctx) => { const httpMethod = 'post'; const requestUrl = 'https://api.twitter.com/oauth/request_token'; const callback = 'http://localhost:8080/twitter/callback'; const consumerKey = 'foo'; // generate a random ~32 byte nonce with only alphanumeric characters const nonce = crypto.randomBytes(32).toString('base64').replace(/[^0-9a-z]/gi, ''); const signatureMethod = 'HMAC-SHA1'; const timestamp = Math.floor(Date.now() / 1000); const accessToken = 'bar'; const version = '1.0'; const parameterString = [ `${encodeURIComponent('oauth_callback')}=${encodeURIComponent(callback)}`, `${encodeURIComponent('oauth_consumer_key')}=${encodeURIComponent(consumerKey)}`, `${encodeURIComponent('oauth_nonce')}=${encodeURIComponent(nonce)}`, `${encodeURIComponent('oauth_signature_method')}=${encodeURIComponent(signatureMethod)}`, `${encodeURIComponent('oauth_timestamp')}=${encodeURIComponent(timestamp)}`, `${encodeURIComponent('oauth_token')}=${encodeURIComponent(accessToken)}`, `${encodeURIComponent('oauth_version')}=${encodeURIComponent(version)}`, ].join('&'); const signatureBaseString = [ httpMethod.toUpperCase(), encodeURIComponent(requestUrl), encodeURIComponent(parameterString), ].join('&'); const consumerSecret = 'fizz'; const accessTokenSecret = ''; const signingKey = [ encodeURIComponent(consumerSecret), encodeURIComponent(accessTokenSecret), ].join('&'); const signature = Buffer.from(crypto.createHmac('sha1', signingKey).update(signatureBaseString).digest('hex')).toString('base64'); const headerSubstring = [ `${encodeURIComponent('oauth_callback')}="${encodeURIComponent(callback)}"`, `${encodeURIComponent('oauth_consumer_key')}="${encodeURIComponent(consumerKey)}"`, `${encodeURIComponent('oauth_nonce')}="${encodeURIComponent(nonce)}"`, `${encodeURIComponent('oauth_signature')}="${encodeURIComponent(signature)}"`, `${encodeURIComponent('oauth_signature_method')}="${encodeURIComponent(signatureMethod)}"`, `${encodeURIComponent('oauth_timestamp')}="${encodeURIComponent(timestamp)}"`, `${encodeURIComponent('oauth_token')}="${encodeURIComponent(accessToken)}"`, `${encodeURIComponent('oauth_version')}="${encodeURIComponent(version)}"`, ].join(', '); const headerString = `OAuth ${headerSubstring}`; try { const { data } = await axios({ headers: { Authorization: headerString, }, method: httpMethod, url: requestUrl, }); ctx.body = data; } catch (e) { console.log('e', e.response.data); console.log('e', e.response.status); console.log('e', e.response.headers); ctx.body = e.response.data; } };

答案 1 :(得分:0)

好,这是我的解决方案,感谢其他帖子: 我创建了一个与EventTime的模相关的存储桶列以创建类别,然后对存储桶进行分组并取均值

    val df = data_input.withColumn("Bucket", toBucketUDF(col("EventTime")))

    val finalDF = df.groupBy("Bucket")
      .agg(mean("RSSI"))
      .withColumnRenamed("avg(RSSI)", "RSSI")
      .orderBy("Bucket")
      .withColumn("EventTime", getTimeUDF(col("Bucket")))
      .drop("Bucket")



    finalDF

  }

  def toBucket(input:Int): Int = {
    val Bucket = input/900
    return Bucket
  }
  def getTime(input: Int): Int = {
    val time = (input+1) * 900
    return time
  }

  val toBucketUDF = udf(toBucket _)
  val getTimeUDF = udf(getTime _)