木偶戏:用无头木偶戏加载更多评论@instagram

时间:2019-02-19 22:31:04

标签: node.js puppeteer scrape headless google-chrome-headless

我正在解析instagram评论(https://instagram.com/p/shortcode/)。要获取帖子的所有评论,请点击“加载更多评论”,直到消失。

这很有效。但是,当无头启动时,我的点击完全没有效果,并且代码陷入了循环。

我需要从ubuntu服务器运行脚本,因此需要它在无头的情况下工作。

这就是我在做什么:

木偶:

import socket
import threading
import psycopg2
import string
import datetime
import resource

resource.setrlimit(resource.RLIMIT_NOFILE, (65536, 65536))
bind_ip = '0.0.0.0'
bind_port = 1339

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((bind_ip, bind_port))
server.listen(5000)  # max backlog of connections

connection = psycopg2.connect(user = "USER",password = "PASS",host = "127.0.0.1",port = "5432",database = "DB")
cursor = connection.cursor()

print 'Listening on {}:{}'.format(bind_ip, bind_port)


def handle_client_connection(client_socket,address):
    while 1:
        request = client_socket.recv(1024)
        request_arr = string.split(request, ",")
        # print 'Received {}'.format(request)
        now = datetime.datetime.now()
        if request_arr[0] == '*HQ':
            try:
                nowdate=now.strftime("%Y-%m-%d")
                nowtime=now.strftime("%H:%M:%S")
                record_datetime=now.strftime("%Y-%m-%d %H:%M:%S")
                postgres_insert_query = """ INSERT INTO public.listener (id) VALUES (%s)"""
                record_to_insert = (request_arr[1])
                cursor.execute(postgres_insert_query, record_to_insert)
                connection.commit()
            except Exception:
                pass
while True:
    client_sock, address = server.accept()
    print 'Accepted connection from {}:{}'.format(address[0], address[1])
    client_handler = threading.Thread(
        target=handle_client_connection,
        args=(client_sock,address[0],)  
    )
    client_handler.start()

用于检查“加载更多评论”按钮是否仍然存在的功能(贷记到AJC24):

const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth")();
puppeteer.use(pluginStealth);

const browser = await puppeteer.launch({headless: true, args:['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu']});
const page = await browser.newPage();

await page.setRequestInterception(true);

page.on('request', (req) => {
    if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
        req.abort();
    }
    else {
        req.continue();
    }
});

然后我的while循环:

const isElementVisible = async (page, cssSelector) => {
  let visible = true;
  await page
    .waitForSelector(cssSelector, { visible: true, timeout: 2000 })
    .catch(() => {
      visible = false;
    });
  return visible;
};

就像我说的那样,当设置{headless:false}时,这可以正常工作。当尝试使用{headless:true}执行此操作时,我陷入了while循环中。

我尝试使用Xvfb在我的ubuntu服务器上以headful模式启动Puppeteer-我已经成功启动了它,但是我仍然陷入while循环中。

0 个答案:

没有答案