NODE-Red解析来自网页的数据

时间:2016-12-09 16:26:02

标签: parsing node-red

我需要从在线html表中获取数据,解析它并在其中找到一些值。所需数据位于课程<table class="zjrtbl" border="0">中。 This is the page I want to parse.这是当地巴士站的时间表。

  1. 如何将此表与某个变量配合使用?
  2. 我如何解析数据,所以我会说这个表的2D数组?
  3. 编辑2:

    我现在有了这个设置:

    [{"id":"a9fffc.914a1008","type":"inject","z":"2988145.2ee976c","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":120,"y":140,"wires":[["889103ea.58886"]]},{"id":"889103ea.58886","type":"http request","z":"2988145.2ee976c","name":"","method":"GET","ret":"txt","url":"http://jizdnirady.idnes.cz/ceskebudejovice/zjr/?date=9.12.2016%20P%C3%A1&l=Trol%205&f=Strakonick%C3%A1%20-%20obchodn%C3%AD%20z%C3%B3na&t=Ro%C5%BEnov%20-%20to%C4%8Dna&wholeweek=true&ttn=CesBud&submit=true","tls":"","x":290,"y":140,"wires":[["9e5d61b1.d8747"]]},{"id":"9e5d61b1.d8747","type":"html","z":"2988145.2ee976c","name":"","tag":".zjrtbl","ret":"text","as":"single","x":430,"y":140,"wires":[["2ff681d4.5dcade"]]},{"id":"db94b76f.32ea58","type":"http in","z":"2988145.2ee976c","name":"","url":"/idos","method":"get","swaggerDoc":"","x":120,"y":100,"wires":[["889103ea.58886"]]},{"id":"5c2e34b0.692dbc","type":"http response","z":"2988145.2ee976c","name":"http","x":1170,"y":140,"wires":[]},{"id":"a9aa5336.dbaaa","type":"function","z":"2988145.2ee976c","name":"connector","func":"msg.payload = msg.payload;\nreturn msg;","outputs":1,"noerr":0,"x":840,"y":140,"wires":[["d8ed8e61.482"]]},{"id":"65013729.cd6df8","type":"function","z":"2988145.2ee976c","name":"split to array","func":"var arr = msg.payload.replace(/\\s+/g, ' ').split(' ');\nmsg.arr = arr;\nreturn msg;","outputs":1,"noerr":0,"x":690,"y":140,"wires":[["a9aa5336.dbaaa"]]},{"id":"daa81b7.bdcc1e8","type":"debug","z":"2988145.2ee976c","name":"payload","active":true,"console":"false","complete":"payload","x":1180,"y":180,"wires":[]},{"id":"2ff681d4.5dcade","type":"split","z":"2988145.2ee976c","name":"","splt":"","x":550,"y":140,"wires":[["65013729.cd6df8"]]},{"id":"d8ed8e61.482","type":"function","z":"2988145.2ee976c","name":"assemble array","func":"msg.payload = \"\";\nfor (var i = 0; i < msg.arr.length; i++) {\n    msg.payload += \"[\" + msg.arr[i] + \"]\";\n}\n\nmsg.statusCode = 200;\nreturn msg;","outputs":1,"noerr":0,"x":1000,"y":140,"wires":[["5c2e34b0.692dbc","daa81b7.bdcc1e8"]]}]
    

    现在它看起来不错,但还有一个小故障...它没有将小时与分钟分开......

1 个答案:

答案 0 :(得分:0)

HTML非常难以解析,它并不总是正确的XML,因此像XPath这样的东西往往会失败。

HTML节点允许您使用CSS样式选择器来抓取一些网页。所以这样的事情可能会让你更接近。

[{"id":"b5b2b310.4dfc5","type":"inject","z":"59370ac1.51144c","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":138.5,"y":120,"wires":[["d7e0478a.997b8"]]},{"id":"d7e0478a.997b8","type":"http request","z":"59370ac1.51144c","name":"","method":"GET","ret":"txt","url":"http://jizdnirady.idnes.cz/ceskebudejovice/zjr/?date=9.12.2016%20P%C3%A1&l=Trol%205&f=Strakonick%C3%A1%20-%20obchodn%C3%AD%20z%C3%B3na&t=Ro%C5%BEnov%20-%20to%C4%8Dna&wholeweek=true&ttn=CesBud&submit=true","tls":"","x":332.5,"y":141,"wires":[["a8ff0654.6ac8d"]]},{"id":"a8ff0654.6ac8d","type":"html","z":"59370ac1.51144c","name":"","tag":".zjrtbl","ret":"html","as":"single","x":530.5,"y":146,"wires":[["984182c2.bb29e8"]]},{"id":"984182c2.bb29e8","type":"debug","z":"59370ac1.51144c","name":"","active":true,"console":"false","complete":"false","x":752.5,"y":205,"wires":[]}]

这将只取出表格,然后你可以使用另一个HTML节点来取出其他部分。