我正在解析instagram评论(https://instagram.com/p/shortcode/)。要获取帖子的所有评论,请点击“加载更多评论”,直到消失。
这很有效。但是,当无头启动时,我的点击完全没有效果,并且代码陷入了循环。
我需要从ubuntu服务器运行脚本,因此需要它在无头的情况下工作。
这就是我在做什么:
木偶:
import socket
import threading
import psycopg2
import string
import datetime
import resource
resource.setrlimit(resource.RLIMIT_NOFILE, (65536, 65536))
bind_ip = '0.0.0.0'
bind_port = 1339
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.bind((bind_ip, bind_port))
server.listen(5000) # max backlog of connections
connection = psycopg2.connect(user = "USER",password = "PASS",host = "127.0.0.1",port = "5432",database = "DB")
cursor = connection.cursor()
print 'Listening on {}:{}'.format(bind_ip, bind_port)
def handle_client_connection(client_socket,address):
while 1:
request = client_socket.recv(1024)
request_arr = string.split(request, ",")
# print 'Received {}'.format(request)
now = datetime.datetime.now()
if request_arr[0] == '*HQ':
try:
nowdate=now.strftime("%Y-%m-%d")
nowtime=now.strftime("%H:%M:%S")
record_datetime=now.strftime("%Y-%m-%d %H:%M:%S")
postgres_insert_query = """ INSERT INTO public.listener (id) VALUES (%s)"""
record_to_insert = (request_arr[1])
cursor.execute(postgres_insert_query, record_to_insert)
connection.commit()
except Exception:
pass
while True:
client_sock, address = server.accept()
print 'Accepted connection from {}:{}'.format(address[0], address[1])
client_handler = threading.Thread(
target=handle_client_connection,
args=(client_sock,address[0],)
)
client_handler.start()
用于检查“加载更多评论”按钮是否仍然存在的功能(贷记到AJC24):
const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth")();
puppeteer.use(pluginStealth);
const browser = await puppeteer.launch({headless: true, args:['--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu']});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (req) => {
if(req.resourceType() == 'stylesheet' || req.resourceType() == 'font' || req.resourceType() == 'image'){
req.abort();
}
else {
req.continue();
}
});
然后我的while循环:
const isElementVisible = async (page, cssSelector) => {
let visible = true;
await page
.waitForSelector(cssSelector, { visible: true, timeout: 2000 })
.catch(() => {
visible = false;
});
return visible;
};
就像我说的那样,当设置{headless:false}时,这可以正常工作。当尝试使用{headless:true}执行此操作时,我陷入了while循环中。
我尝试使用Xvfb在我的ubuntu服务器上以headful模式启动Puppeteer-我已经成功启动了它,但是我仍然陷入while循环中。