Question

我有一个用于抓取网站的小型网页抓取工具，它可以捕获诸如

之类的信息

网页内容
Page Technology
标题
饼干
请求

现在，我正在AWS Lambda中工作，并尝试提取元素，但是努力正确地解析数据。

要获取我要做的数据

const parsedData = JSON.parse(data);

如果我那么做

console.log(parsedData)

我得到一个有效的JSON对象（出于可读性的考虑而缩短）

{
    "Scrape": {
        "PageContent": [
            {
                "tag": "title",
                "text": "Stack Overflow - Where Developers Learn, Share, & Build Careers"
            },
            {
                "tag": "span",
                "text": "Stack Overflow"
            },
            {
                "tag": "span",
                "text": "new"
            },
            {
                "tag": "h4",
                "text": "Try Stack Overflow for Business"
            },
            {
                "tag": "span",
                "text": "rev 2019.5.3.33574"
            }
        ],
        "PageTech": [
            {
                "name": "DoubleClick for Publishers (DFP)",
                "confidence": "100",
                "version": null,
                "icon": "DoubleClick.svg",
                "website": "http://www.google.com/dfp",
                "categories": [
                    {
                        "36": "Advertising Networks"
                    }
                ]
            },
            {
                "name": "Elementor",
                "confidence": "100",
                "version": null,
                "icon": "Elementor.png",
                "website": "https://elementor.com",
                "categories": [
                    {
                        "51": "Landing Page Builders"
                    }
                ]
            },
            {
                "name": "MySQL",
                "confidence": "0",
                "version": null,
                "icon": "MySQL.svg",
                "website": "http://mysql.com",
                "categories": [
                    {
                        "34": "Databases"
                    }
                ]
            }
        ],
        "PageHeaders": {
            "status": [
                "200"
            ],
            "cache-control": [
                "private"
            ],
            "content-type": [
                "text/html; charset=utf-8"
            ],
            "content-encoding": [
                "gzip"
            ],
            "x-frame-options": [
                "SAMEORIGIN"
            ],
            "x-request-guid": [
                "f7b0f406-a047-49b0-a822-18549cec5510"
            ],
            "strict-transport-security": [
                "max-age=15552000"
            ],
            "content-security-policy": [
                "upgrade-insecure-requests"
            ],
            "accept-ranges": [
                "bytes"
            ],
            "date": [
                "Sat, 04 May 2019 15:31:45 GMT"
            ],
            "via": [
                "1.1 varnish"
            ],
            "x-served-by": [
                "cache-dca17748-DCA"
            ],
            "x-cache": [
                "MISS"
            ],
            "x-cache-hits": [
                "0"
            ],
            "x-timer": [
                "S1556983905.306505,VS0,VE17"
            ],
            "vary": [
                "Accept-Encoding,Fastly-SSL"
            ],
            "x-dns-prefetch-control": [
                "off"
            ],
            "set-cookie": [
                "prov=c352d0ff-77bc-1152-f587-c60c5b156354; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly"
            ],
            "content-length": [
                "52808"
            ]
        },
        "PageCookies": [
            {
                "name": "__qca",
                "value": "P0-513591828-1556983906778",
                "domain": ".stackoverflow.com",
                "path": "/",
                "expires": 1590852706,
                "size": 31,
                "httpOnly": false,
                "secure": false,
                "session": false
            },
            {
                "name": "prov",
                "value": "c352d0ff-77bc-1152-f587-c60c5b156354",
                "domain": ".stackoverflow.com",
                "path": "/",
                "expires": 2682374400.32448,
                "size": 40,
                "httpOnly": true,
                "secure": false,
                "session": false
            },
            {
                "name": "notice-so4",
                "value": "!1",
                "domain": "stackoverflow.com",
                "path": "/",
                "expires": 1558620000,
                "size": 12,
                "httpOnly": false,
                "secure": false,
                "session": false
            }
        ],
        "PageRequests": [
            {},
            "request url:",
            "https://stackoverflow.com/",
            "request url:",
            "https://csi.gstatic.com/csi?s=ampad&ctx=2&puid=1~1556983907538&qqid=CNPSx4WZguICFREahgodlIgJLA&rt=a4a.link.i.43.8.2.1s.0.1ncx.1mpg~aa.script.j.4c.b.8.0.0.toq.tl3~simg.img.z.1w.1.5.0.0.bob.bjn~vu.img.z.29.2.h.0.0.87.0&met.a4a=dcl.0~ol.423~nvs.1556983907063~ini.1556983907539",
            "request url:",
            "https://pagead2.googlesyndication.com/pcs/activeview?xai=AKAOjssfcs88N7qDU9Qt-wDm3tVGQ1IHYMTJIMVe7GXXRffOfT5bDoX5Q2J-38ZpgGqsphazKd2DBC22PTJF-bHEAW9artqsaPzETpUg8026EaA&sig=Cg0ArKJSzNKjZvqCXbUqEAE&id=ampim&o=1268,478&d=300,250&ss=800,600&bs=1920,1080&mcvt=1019&mtos=0,0,1019,1019,1019&tos=0,0,1019,0,0&tfs=136&tls=1155&g=100&h=100&pt=423&tt=1168&rpt=423&rst=1556983907063&r=v&adk=2451320170&avms=ampa"
        ]
    }
}

当我尝试从JSON中提取信息时，我会不断获得

undefined

取回价值。

我尝试过的例子是

console.log(parsedData.Scrape.PageContent)
console.log(parsedData.Scrape.PageContent[0].text
console.log(parsedData[1][1])
console.log(parsedData.Scrape)
console.log(parsedData[1]['Scrape'])

期望的输出是能够获取单个元素，以便我可以将它们写入数据库

从JSON提取元素

0 个答案: