如何解析lambda函数作为输入接收的未转义json?

时间:2019-02-04 13:31:57

标签: java json parsing aws-lambda escaping

我正在使用网络抓取工具(Parsehub)提取数据。提取完成后,Parsehub将有关此数据的信息(以JSON格式)发送到Amazon Lambda,我将其用作Webhook。但是此JSON无法正确转义,因此Lambda引发错误(例如,无法解析请求正文)。如何转义JSON字符串,以便Lambda不会引发错误?我还使用eclipse测试了此功能。

我已经使用简单的Java类型作为输入(https://docs.aws.amazon.com/lambda/latest/dg/java-programming-model-req-resp.html)。我还尝试过使用POJO(https://docs.aws.amazon.com/lambda/latest/dg/java-handler-io-type-pojo.html)和字节流实现(https://docs.aws.amazon.com/lambda/latest/dg/java-handler-io-type-stream.html)作为输入,但是它仍然会引发json解析错误。

这是我的Lambda处理程序代码的一部分:

public class LambdaFunctionHandler implements RequestHandler<Object, String> {

    @Override
    public String handleRequest(Object input, Context context) {
        System.out.println("input - " + input);
        return "response";
    }
}


这是JSON,Parsehub正在发送给Lambda:

{
    "run_token": "I have removed this",
    "status": "complete",
    "md5sum": "90dc9753513a248502414e8d5345a6de /phfiles/ty6qie7-ut5C.gz ",
    "custom_proxies": "",
    "data_ready": 1,
    "template_pages": {},
    "start_time": "2019-01-30T11:01:58",
    "owner_email": "I have removed this",
    "webhook": "https://api endpoint of lambda function",
    "is_empty": false,
    "project_token": "I have removed this",
    "end_time": "2019-01-30T11:02:19",
    "start_running_time": "2019-01-30T11:01:59",
    "options_json": "{"recoveryRules": "{}", "rotateIPs": false, "sendEmail": true, "allowPerfectSimulation": false, "ignoreDisabledElements": true, "webhook": "https://api endpoint of lambda function", "outputType": "csv", "customProxies": "", "preserveOrder": false, "startTemplate": "main_template", "allowReselection": false, "proxyDisableAdblock": false, "proxyCustomRotationHybrid": false, "maxWorkers": "0", "loadJs": true, "startUrl": "https://address of the website from which data is extracted", "startValue": "{}", "maxPages": "0", "proxyAllowInsecure": false}",
    "start_value": "{}",
    "start_template": "main_template",
    "pages": 2,
    "start_url": "https://address of the website from which data is extracted"
}


这是我的Cloudwatch日志中的输出:

Lambda invocation failed with status: 400. Lambda request id: eecd695e-61e7-47d9-bc27-04628c99e158
Execution failed: Could not parse request body into json: Unrecognized token 'run_token': was expecting ('true', 'false' or 'null')
at [Source: [B@36f6b2e9; line: 1, column: 11]


这是我的Eclipse控制台中的输出:

Invoking function...
==================== INVOCATION ERROR ====================
com.amazonaws.services.lambda.model.InvalidRequestContentException: Could not parse request body into json: Unexpected character ('r' (code 114)): was expecting comma to separate Object entries
at [Source: [B@1ade7b2b; line: 15, column: 21] (Service: AWSLambda; Status Code: 400; Error Code: InvalidRequestContentException; Request ID: b46bf0b4-4bb2-4bc0-aa13-81457349153c)

我们可以看到“ options_json”:“ {” recoveryRules“:” {}“, ....... JSON的一部分没有被转义。更改parsehub发送的json。我只能对Lambda进行数据处理。

1 个答案:

答案 0 :(得分:0)

参加聚会可能为时已晚。但是我有这个问题,我的结论是:

  • API网关可以管理两个不同的协议。他们称它们为REST和HTTP
  • HTTP协议具有“路由”。每条路线都有一个有效载荷格式版本的集成方法
  • 在以最简单的方式设计Webhook时,大多数事情都是自以为是的,因此您可以使用默认的包罗万象的路由和有效负载格式v2.0在API网关和lambda之间进行无缝集成

这将导致所有请求都作为一个大JSON对象直接转到lambda事件。标头,requestContext,正文... 正文未反序列化,它只是此大JSON的'body'属性的有效负载,采用转义的字符串格式。

因此,在到达lambda函数时,您必须相应地对其进行处理以反序列化它并获取一个对象。如果是Node.js lambda,则应执行

exports.handler = async (bigEvent, context) => {
    // Deserializing just the body
    event = JSON.parse(bigEvent.body);
    console.log('value1 =', event.key1);
    return event.key1; 
};

为澄清起见,我会说bigEvent类似于

{
  version: '2.0',
  routeKey: 'POST /endpoint',
  rawPath: '/endpoint',
  rawQueryString: '',
  headers: {
    accept: '*/*',
    ...
  },
  requestContext: {
    accountId: '123456789012',
    ....
  },
  body: '{\n    "key1": "importantDatum",\n    "key2": "..."\n}',
  isBase64Encoded: false
}

如果您想使用JSON进行响应,则应在发送之前(使用JSON.stringify(...))对它进行序列化