我的输入是一个Spark数据框:
EventTime,Signal
0,-65
10,-63
20,-71
40,-65
50,-62
80,-81
90,-84
100,-81
...
85460,-71
85480,-66
85490,-89
85500,-80
我想获取EventTime
每900秒的Signal平均值,输出如下所示:
EventTime, MeanSignal
0, mean
900, mean
1800, mean
...
85500, mean
我的问题是常规数据中没有EventTime的常规步骤,因此我无法将数据帧分割为相同长度的部分...
答案 0 :(得分:0)
您可以将新列添加为EventTime / 900并根据该列进行分组。像这样的东西。
{
"errors": [
{
"code": 32,
"message": "Could not authenticate you."
}
]
}
结果看起来像这样。 EventTime 0表示介于0-899之间的值,依此类推。
const setAccessToken = async (ctx) => {
const httpMethod = 'post';
const requestUrl = 'https://api.twitter.com/oauth/request_token';
const callback = 'http://localhost:8080/twitter/callback';
const consumerKey = 'foo';
// generate a random ~32 byte nonce with only alphanumeric characters
const nonce = crypto.randomBytes(32).toString('base64').replace(/[^0-9a-z]/gi, '');
const signatureMethod = 'HMAC-SHA1';
const timestamp = Math.floor(Date.now() / 1000);
const accessToken = 'bar';
const version = '1.0';
const parameterString = [
`${encodeURIComponent('oauth_callback')}=${encodeURIComponent(callback)}`,
`${encodeURIComponent('oauth_consumer_key')}=${encodeURIComponent(consumerKey)}`,
`${encodeURIComponent('oauth_nonce')}=${encodeURIComponent(nonce)}`,
`${encodeURIComponent('oauth_signature_method')}=${encodeURIComponent(signatureMethod)}`,
`${encodeURIComponent('oauth_timestamp')}=${encodeURIComponent(timestamp)}`,
`${encodeURIComponent('oauth_token')}=${encodeURIComponent(accessToken)}`,
`${encodeURIComponent('oauth_version')}=${encodeURIComponent(version)}`,
].join('&');
const signatureBaseString = [
httpMethod.toUpperCase(),
encodeURIComponent(requestUrl),
encodeURIComponent(parameterString),
].join('&');
const consumerSecret = 'fizz';
const accessTokenSecret = '';
const signingKey = [
encodeURIComponent(consumerSecret),
encodeURIComponent(accessTokenSecret),
].join('&');
const signature = Buffer.from(crypto.createHmac('sha1', signingKey).update(signatureBaseString).digest('hex')).toString('base64');
const headerSubstring = [
`${encodeURIComponent('oauth_callback')}="${encodeURIComponent(callback)}"`,
`${encodeURIComponent('oauth_consumer_key')}="${encodeURIComponent(consumerKey)}"`,
`${encodeURIComponent('oauth_nonce')}="${encodeURIComponent(nonce)}"`,
`${encodeURIComponent('oauth_signature')}="${encodeURIComponent(signature)}"`,
`${encodeURIComponent('oauth_signature_method')}="${encodeURIComponent(signatureMethod)}"`,
`${encodeURIComponent('oauth_timestamp')}="${encodeURIComponent(timestamp)}"`,
`${encodeURIComponent('oauth_token')}="${encodeURIComponent(accessToken)}"`,
`${encodeURIComponent('oauth_version')}="${encodeURIComponent(version)}"`,
].join(', ');
const headerString = `OAuth ${headerSubstring}`;
try {
const { data } = await axios({
headers: {
Authorization: headerString,
},
method: httpMethod,
url: requestUrl,
});
ctx.body = data;
} catch (e) {
console.log('e', e.response.data);
console.log('e', e.response.status);
console.log('e', e.response.headers);
ctx.body = e.response.data;
}
};
答案 1 :(得分:0)
好,这是我的解决方案,感谢其他帖子: 我创建了一个与EventTime的模相关的存储桶列以创建类别,然后对存储桶进行分组并取均值
val df = data_input.withColumn("Bucket", toBucketUDF(col("EventTime")))
val finalDF = df.groupBy("Bucket")
.agg(mean("RSSI"))
.withColumnRenamed("avg(RSSI)", "RSSI")
.orderBy("Bucket")
.withColumn("EventTime", getTimeUDF(col("Bucket")))
.drop("Bucket")
finalDF
}
def toBucket(input:Int): Int = {
val Bucket = input/900
return Bucket
}
def getTime(input: Int): Int = {
val time = (input+1) * 900
return time
}
val toBucketUDF = udf(toBucket _)
val getTimeUDF = udf(getTime _)