木偶-按div类搜索元素-返回元素的所有div类

时间:2018-08-01 17:10:30

标签: javascript html puppeteer

我有一个我需要抓取的站点,通过搜索特定的div类来查找div上的类列表。

例如,如果我们有代码:

//HTML on site
<div class="main">Main Stuff</div>
<div class="class1 class 2 specialclass">Other Stuff</div>
<div class="footer">Footer Stuff</div>'

我需要搜索“特殊类”作为div类,并返回该div的类列表,所以我想返回: class1 class2 specialclass

我以Wikibooks网站为例,并运行以下代码:

//Puppeteer Code
const puppeteer = require('puppeteer')
const devices = require('puppeteer/DeviceDescriptors');

const browser = await puppeteer.launch();

const page = await browser.newPage();
await page.goto('https://www.wikibooks.org/');

const myclassname = await page.evaluate(() => 
document.querySelector('.lang1').innerText);

console.log(myclassname);

它使用类lang1(位于屏幕顶部附近的默认语言div)搜索div,并向我返回对象的文本,但我不知道要更改什么{{1 }}以获取对象的类名,因此它将返回innerText,该对象的所有类。

2 个答案:

答案 0 :(得分:4)

请考虑webpage you specified中的以下元素:

public class GameManager : MonoBehaviour
{
    public GameObject[] _dice;

    public Vector3 _rollStartPosition;
    public float _rollForce;
    public float _rollTorque;
    bool doneRolling = true;

    void Update()
    {
        if (doneRolling && Input.GetMouseButtonDown(0))
        {
            StartCoroutine(RollDice());
        }
    }

    IEnumerator RollDice()
    {
        doneRolling = false;

        foreach (var die in _dice)
        {
            // Roll() adds force and torque from a given starting position
            die.GetComponent<Die>()
               .Roll(_rollStartPosition, Random.onUnitSphere * _rollForce, Random.onUnitSphere * _rollTorque);
        }

        //Wait until all dice tops moving
        yield return CheckIfDiceAreMoving();


        // Calculate score and do something with it...


        //Set doneRolling to true so that we call this funtion again
        doneRolling = true;
    }

    IEnumerator CheckIfDiceAreMoving()
    {
        foreach (var die in _dice)
        {
            var dieRigidbody = die.GetComponent<Rigidbody>();
            //Wait until all dice stops moving
            while (!dieRigidbody.IsSleeping())
            {
                yield return null;
            }
        }
    }
}

您可以使用classNamegetAttribute('class')来获取元素的class属性的内容:

<div class="central-featured-lang lang1" lang="en">...</div>

或者,您可以使用classList返回元素类的可迭代数组:

const myclassname = await page.evaluate(() => document.querySelector('.lang1' ).className);

console.log(myclassname); // Returns "central-featured-lang lang1"

答案 1 :(得分:0)

使用

.getAttribute("class");

例如

var x = document.getElementsByTagName("H1")[0].getAttribute("class");