我正在使用node.js和puppeteer来获取一些数据。我尝试了几次尝试,但在获取想要的2nd-7th参数时遇到了困难。
那是我在控制台中的输出之一:
Company 1
our error TypeError: formRow.evaluate is not a function
at main (/home/web/app.js:36:37)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
$ node app.js
这就是我正在查看的HTML:
<body>
<table summary="">...</table>
<table summary="">...</table>
<div>
<table summary="">
<tbody>
<tr>
<td></td>
<td></td>
<td valign="top" bgcolor="#E6E6E6" align="left">
<a href="/count=100">Company 1</a>
</td>
</tr>
<tr nowrap="nowrap" valign="top" align="left">
<td nowrap="nowrap">4</td>
<td nowrap="nowrap"><a href="/index.htm">[html]</a><a href="/abx.txt">[text]</a></td>
<td class="small">Categorie 1<br>Accession Number: 1243689234
</td>
<td nowrap="nowrap">2018-08-14<br>16:35:41</td>
<td nowrap="nowrap">2018-08-14</td>
<td nowrap="nowrap" align="left">
<a href="/count=100">001-32722</a><br>181018204
</td>
</tr>
<tr>
<td></td>
<td></td>
<td valign="top" bgcolor="#E6E6E6" align="left">
<a href="/count=100">Company 2</a>
</td>
</tr>
<tr nowrap="nowrap" valign="top" align="left">
<td nowrap="nowrap">4</td>
<td nowrap="nowrap"><a href="/index.htm">[html]</a><a href="/abx.txt">[text]</a></td>
<td class="small">Categorie 2<br>Accession Number: 0001179110
</td>
<td nowrap="nowrap">2018-08-14<br>16:35:41</td>
<td nowrap="nowrap">2018-08-14</td>
<td nowrap="nowrap" align="left">
<a href="/count=100">001-32722</a><br>181018204
</td>
</tr>
....
</tbody>
</table>
</div>
<form>...</form>
...
<table summary="">...</table>
</body>
到目前为止,这是我的操纵p的设置。第一个参数(例如Company)工作正常。 app.js:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless: false })
const page = await browser.newPage();
await page.goto('some page');
const table = await page.waitForSelector('body div table[summary]');
const titles = await page.$$('body div table[summary] tr td[bgcolor]');
console.log(titles.length);
const tableRows = await page.$$('body div table[summary] tr[nowrap]');
console.log(tableRows.length);
for (let i=0; i < tableRows.length; i++){
const ciks = await page.$$('body div table[summary] tr td[bgcolor]');
const cik = ciks[i];
const button = await cik.$('body div table[summary] tr td[bgcolor] a');
const titleName = await page.evaluate(button => button.innerText, button);
console.log(titleName);
const formRows = await page.$$('body div table[summary] tr[nowrap]');
const formRow = formRows[i];
const tableCell = await formRow.$('body div table[summary] tr[nowrap] td');
const cell = await tableCell.$eval(() => {
document.querySelector('body div table[summary] tr[nowrap] td:nth-child(1)');
});
console.log(cell);
//const cell = await tableCell.$eval('td', td => td.innerText);
//console.log(cell);
}
console.log('\n');
console.log('done');
await browser.close();
} catch (e) {
console.log('our error', e);
}
})();
在上面的尝试中,我试图获取1和2参数……但是最后是所需的结果:
Company 1
4
[html]
Categorie 1
2018-08-14
2018-08-14
001-32722
Company 2
4
[html]
Categorie 2
2018-08-14
2018-08-14
001-32722
...
我正在使用Chrominium 68在32位Ubuntu 16.04上运行它...