如何从该网站获取数据?

时间:2019-01-20 05:27:07

标签: web-scraping windows-10 read-data screen-grab data-capture

此处(http://www.tsetmc.com/Loader.aspx?ParTree=151311&i=46741025610365786#)处有一个站点,该表的每个字段(由黄色正方形表示)均显示有关某一天的信息。我需要做的是仅读取每个字段的حجم行(我的意思是我在以下照片中用红色方块指定的内容(您应该转到第一张照片中提到的标签,才能看到第二张照片) ):

enter image description here enter image description here

并将它们(存储在我的计算机中)写在一个文本文件中,如下所示:

6.832 M (14%) , 40.475 M (85%), 248,000 (0%), 47.059 M (99%)
605,000 (3%), 15.277 M (96%), 478,714 (3%), 15.404 M (96%)
8.102 M (42%), 10.751 M (57%), 9.599 M (50%), 9.253 M (49%)
215,937 (2%), 9.417 M (97%), 1.115 M (11%), 8.518 M (88%)
3.351 M (15%), 18.284 M (84%), 5.987 M (27%), 15.647 M (72%)

但是我不知道有可能吗?如果是这样,我该如何以最简单的方式做到这一点?(我使用Windows10)

  

编辑:   我成功完成了步骤3,并在步骤4中运行了node extract.js命令。我得到了以下结果:

[ 'حجم, 47.059 M (99%), 248,000 (0%), 40.475 M (85%), 6.832 M (14%)',
  'حجم, 15.404 M (96%), 478,714 (3%), 15.277 M (96%), 605,000 (3%)',
  'حجم, 9.253 M (49%), 9.599 M (50%), 10.751 M (57%), 8.102 M (42%)',
  'حجم, 8.518 M (88%), 1.115 M (11%), 9.417 M (97%), 215,937 (2%)',
  'حجم, 15.647 M (72%), 5.987 M (27%), 18.284 M (84%), 3.351 M (15%)',
  'حجم, 21.848 M (93%), 1.501 M (6%), 21.648 M (92%), 1.701 M (7%)',
  'حجم, 30.845 M (95%), 1.3 M (4%), 30.663 M (95%), 1.482 M (4%)',
  'حجم, 9.914 M (64%), 5.474 M (35%), 9.938 M (64%), 5.45 M (35%)',
  'حجم, 10.775 M (97%), 250,000 (2%), 10.995 M (99%), 30,000 (0%)',
  'حجم, 21.328 M (91%), 2.027 M (8%), 22.315 M (95%), 1.04 M (4%)',
  'حجم, 19.588 M (92%), 1.54 M (7%), 21.048 M (99%), 80,000 (0%)',
  'حجم, 12.554 M (96%), 418,000 (3%), 11.504 M (88%), 1.468 M (11%)',
  'حجم, 14.98 M (92%), 1.299 M (7%), 16.135 M (99%), 144,008 (0%)',
  'حجم, 10.878 M (95%), 502,040 (4%), 11.378 M (99%), 2,040 (0%)',
  'حجم, 10.012 M (97%), 275,000 (2%), 10.287 M (100%), 0 (0%)',
  'حجم, 11.992 M (95%), 500,000 (4%), 11.707 M (93%), 785,244 (6%)',
  'حجم, 16.492 M (95%), 820,000 (4%), 17.056 M (98%), 256,241 (1%)',
  'حجم, 19.639 M (98%), 378,384 (1%), 20.017 M (100%), 0 (0%)',
  'حجم, 13.781 M (95%), 639,609 (4%), 14.161 M (98%), 260,000 (1%)',
  'حجم, 31.797 M (99%), 300,507 (0%), 26.089 M (81%), 6.009 M (18%)',
  'حجم, 18.159 M (99%), 30,391 (0%), 15.914 M (87%), 2.275 M (12%)',
  'حجم, 21.271 M (95%), 1.01 M (4%), 21.501 M (96%), 780,000 (3%)',
  'حجم, 17.322 M (62%), 10.615 M (37%), 19.437 M (69%), 8.5 M (30%)',
  'حجم, 37.817 M (97%), 1.03 M (2%), 34.125 M (87%), 4.722 M (12%)',
  'حجم, 55.396 M (99%), 211,000 (0%), 52.507 M (94%), 3.1 M (5%)',
  'حجم, 23.141 M (98%), 420,000 (1%), 23.461 M (99%), 100,000 (0%)',
  'حجم, 46.215 M (82%), 9.919 M (17%), 49.764 M (88%), 6.371 M (11%)',
  'حجم, 1.26 M (100%), 0 (0%), 1.26 M (100%), 0 (0%)',
  'حجم, 35.89 M (99%), 251,000 (0%), 35.921 M (99%), 220,000 (0%)',
  'حجم, 48.509 M (88%), 6.349 M (11%), 54.052 M (98%), 806,362 (1%)',
  'حجم, 41.018 M (91%), 4.006 M (8%), 41.564 M (92%), 3.46 M (7%)',
  'حجم, 40.02 M (99%), 100,000 (0%), 39.22 M (97%), 900,000 (2%)',
  'حجم, 36.974 M (99%), 30,000 (0%), 36.549 M (98%), 455,500 (1%)',
  'حجم, 35.739 M (99%), 230,000 (0%), 33.104 M (92%), 2.866 M (7%)',
  'حجم, 19.627 M (100%), 0 (0%), 18.877 M (96%), 750,000 (3%)',
  'حجم, 19.603 M (81%), 4.379 M (18%), 23.982 M (100%), 0 (0%)',
  'حجم, 10.186 M (97%), 250,000 (2%), 10.436 M (100%), 0 (0%)',
  'حجم, 15.414 M (98%), 250,500 (1%), 15.465 M (98%), 200,000 (1%)',
  'حجم, 21.571 M (97%), 665,000 (2%), 22.236 M (100%), 0 (0%)',
  'حجم, 15.537 M (98%), 250,000 (1%), 15.787 M (100%), 0 (0%)',
  'حجم, 21.422 M (98%), 221,004 (1%), 21.243 M (98%), 400,000 (1%)',
  'حجم, 30.662 M (92%), 2.375 M (7%), 33.036 M (100%), 0 (0%)',
  'حجم, 39.287 M (98%), 455,000 (1%), 39.742 M (100%), 0 (0%)',
  'حجم, 53.141 M (89%), 6.11 M (10%), 59.131 M (99%), 120,000 (0%)',
  'حجم, 23.587 M (98%), 255,000 (1%), 23.842 M (100%), 0 (0%)',
  'حجم, 17.043 M (98%), 255,000 (1%), 17.298 M (100%), 0 (0%)',
  'حجم, 33.51 M (96%), 1.25 M (3%), 34.75 M (99%), 10,000 (0%)',
  'حجم, 36.408 M (99%), 15,000 (0%), 28.248 M (77%), 8.175 M (22%)',
  'حجم, 32.367 M (98%), 480,000 (1%), 31.535 M (96%), 1.312 M (3%)',
  'حجم, 54.773 M (95%), 2.68 M (4%), 43.936 M (76%), 13.517 M (23%)',
  'حجم, 58.955 M (95%), 2.54 M (4%), 41.234 M (67%), 20.262 M (32%)',
  'حجم, 45.222 M (99%), 15,000 (0%), 40.215 M (88%), 5.023 M (11%)',
  'حجم, 43.487 M (97%), 1.225 M (2%), 43.902 M (98%), 810,008 (1%)',
  'حجم, 35.46 M (91%), 3.18 M (8%), 38.33 M (99%), 310,000 (0%)',
  'حجم, 39.42 M (99%), 90,927 (0%), 36.722 M (92%), 2.789 M (7%)',
  'حجم, 41.024 M (99%), 312,000 (0%), 35.814 M (86%), 5.522 M (13%)',
  'حجم, 32.718 M (99%), 277,978 (0%), 30.995 M (93%), 2.001 M (6%)',
  'حجم, 1.12 M (100%), 0 (0%), 1.12 M (100%), 0 (0%)',
  'حجم, 2.015 M (86%), 325,000 (13%), 2.34 M (100%), 0 (0%)',
  'حجم, 40.402 M (95%), 2.109 M (4%), 42.511 M (100%), 0 (0%)',
  'حجم, 41.726 M (86%), 6.372 M (13%), 48.098 M (100%), 0 (0%)',
  'حجم, 39.444 M (97%), 1.14 M (2%), 39.551 M (97%), 1.033 M (2%)',
  'حجم, 4.14 M (100%), 0 (0%), 3.14 M (75%), 1,000,000 (24%)',
  'حجم, 43.447 M (96%), 1.743 M (3%), 44.292 M (98%), 898,000 (1%)',
  'حجم, 56.023 M (98%), 864,338 (1%), 52.627 M (92%), 4.26 M (7%)',
  'حجم, 14.062 M (99%), 8,008 (0%), 12.055 M (85%), 2.015 M (14%)',
  'حجم, 56.557 M (84%), 10.413 M (15%), 66.47 M (99%), 500,000 (0%)',
  'حجم, 7.971 M (69%), 3.481 M (30%), 11.452 M (100%), 0 (0%)',
  'حجم, 38.85 M (86%), 5.864 M (13%), 44.494 M (99%), 220,000 (0%)',
  'حجم, 53.151 M (99%), 105,000 (0%), 51.039 M (95%), 2.217 M (4%)',
  'حجم, 51.861 M (79%), 13.352 M (20%), 64.603 M (99%), 610,000 (0%)',
  'حجم, 2.025 M (80%), 500,000 (19%), 2.525 M (100%), 0 (0%)',
  'حجم, 67.428 M (95%), 3.294 M (4%), 68.538 M (96%), 2.184 M (3%)',
  'حجم, 52.373 M (87%), 7.211 M (12%), 58.408 M (98%), 1.176 M (1%)',
  'حجم, 12.073 M (80%), 3.01 M (19%), 14.583 M (96%), 500,000 (3%)',
  'حجم, 47.369 M (99%), 424,000 (0%), 30.168 M (63%), 17.626 M (36%)',
  'حجم, 3.401 M (100%), 0 (0%), 1.039 M (30%), 2.363 M (69%)',
  'حجم, 52.213 M (99%), 247,000 (0%), 41.872 M (79%), 10.588 M (20%)',
  'حجم, 73.585 M (98%), 1.356 M (1%), 38.911 M (51%), 36.029 M (48%)',
  'حجم, 67.943 M (97%), 1.622 M (2%), 35.571 M (51%), 33.995 M (48%)',
  'حجم, 2.653 M (100%), 0 (0%), 2.003 M (75%), 650,000 (24%)',
  'حجم, 32.055 M (99%), 18,408 (0%), 24.301 M (75%), 7.772 M (24%)',
  'حجم, 16.989 M (98%), 209,000 (1%), 9.598 M (55%), 7.6 M (44%)',
  'حجم, 34.906 M (95%), 1.64 M (4%), 21.129 M (57%), 15.417 M (42%)',
  'حجم, 14.669 M (98%), 150,000 (1%), 7.852 M (52%), 6.967 M (47%)',
  'حجم, 23.542 M (98%), 289,600 (1%), 23.102 M (96%), 729,782 (3%)',
  'حجم, 27.87 M (98%), 450,000 (1%), 21.461 M (75%), 6.859 M (24%)',
  'حجم, 48.785 M (98%), 500,000 (1%), 30.683 M (62%), 18.603 M (37%)',
  'حجم, 22.839 M (93%), 1.518 M (6%), 16.242 M (66%), 8.115 M (33%)',
  'حجم, 15.683 M (96%), 631,500 (3%), 13.316 M (81%), 2.999 M (18%)',
  'حجم, 15.715 M (96%), 630,000 (3%), 15.436 M (94%), 908,399 (5%)',
  'حجم, 11.776 M (90%), 1.305 M (9%), 13.081 M (100%), 0 (0%)',
  'حجم, 12.492 M (85%), 2.057 M (14%), 14.149 M (97%), 400,000 (2%)',
  'حجم, 11.909 M (100%), 0 (0%), 11.818 M (99%), 91,008 (0%)',
  'حجم, 21.404 M (99%), 140,000 (0%), 17.8 M (82%), 3.744 M (17%)',
  'حجم, 22.115 M (89%), 2.718 M (10%), 21.969 M (88%), 2.864 M (11%)',
  'حجم, 23.146 M (97%), 637,396 (2%), 21.881 M (92%), 1.902 M (7%)',
  'حجم, 35.986 M (94%), 1.92 M (5%), 25.749 M (67%), 12.156 M (32%)',
  'حجم, 16.064 M (93%), 1.179 M (6%), 17.104 M (99%), 139,467 (0%)',
  'حجم, 19.314 M (85%), 3.284 M (14%), 22.408 M (99%), 189,500 (0%)',
  ... 84 more items ]
(node:13916) UnhandledPromiseRejectionWarning: TypeError [ERR_INVALID_CALLBACK]: Callback must be a function
    at maybeCallback (fs.js:129:9)
    at Object.writeFile (fs.js:1159:14)
    at C:\Users\m\Desktop\GetData\extract.js:21:14
    at process._tickCallback (internal/process/next_tick.js:68:7)
(node:13916) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:13916) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process

具有非零的退出代码。

但是没有store.txt文件!

1 个答案:

答案 0 :(得分:2)

  1. 从此处下载并安装node.js&npm-https://www.npmjs.com/get-npm

  2. 在PC上的任何位置创建文件夹,在其中创建文件extract.js并粘贴以下代码。

    Code

  3. 然后在该文件夹中打开命令提示符,然后运行“ npm install puppeteer”(可能需要几分钟才能完成)

  4. 然后运行“ node extract.js”

  5. 成功运行后,您将在同一文件夹中包含“ store.txt”文件,其中包含您的预期结果。