使用request / cheerio将html刮入js对象

时间:2015-07-25 01:09:19

标签: javascript node.js cheerio

Cheerio和js的新手。我试图将所有投手名称及其相关统计信息写入JSON对象,如下所示:

var pitchers = {
    name: 'Just Verlander',
    era: 6.62
    etc...
    etc...
}

这是我试图抓取的HTML:

<tr class="">
<td class="stat-name-width"><img src="../../style/assets/img/mlb/team-logos/tigers.png" height="20"/>  
<span class="pitcher-name">Justin Verlander</span> 
<div class="fantasy-blue inline fantasy-data pitcher-salary-fd">$7,100</div>   
<small class="text-muted pitches">(R)</small> 
<small class="text-muted matchup">(@ BOS)</small></td>
        <td class="stat-stat-width fantasy-blue fantasy-points">
        <td class="stat-stat-width">0-3</td>
        <td class="stat-stat-width">6.62</td>
        <td class="stat-stat-width">1.50</td>
        <td class="stat-stat-width">5.82</td>
        <td class="stat-stat-width">3.18</td>
        <td class="stat-stat-width">2.12</td>
        <td class="stat-stat-width">5.67</td>
        <td class="stat-stat-width">1.03x</td>
        <td class="stat-stat-width">0.96x</td>
        <td class="stat-stat-width">1.09x</td>
        <td class="stat-stat-width">0.90x</td>
</tr> 

在同一页面上大约有30名投手具有相同的结构。

这是我到目前为止所做的:

test = $(&#39; span.pitcher-name&#39;)。text();给了我所有的投手名字,而不只是一个。

显然我甚至没有关闭......我无法弄清楚如何让投手名字的孩子与javascript对象联系......任何帮助都是非常感谢!

2 个答案:

答案 0 :(得分:1)

看起来你想要的是$()。each()函数。

使用此函数,您可以遍历标记的每个实例并执行回调函数,如下所示:

var someObjArr = [];

$('span.pitcher-name').each(function(i, element){

    //Get the text from cheerio.
    var text = $(this).text();

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the name property of our object with the text value.
    someObjArr[i].name = text;
}); 

$('div.pitcher-salary-fd').each(function(i, element){

    //Get the text from cheerio.
    var text = $(this).text();

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the salary property of our object with the text value.
    someObjArr[i].salary = text;
}); 

console.log(someObjArr); //[ { name: 'Justin Verlander', salary: '$7,100' } ]

关于此功能的最佳部分之一是它同步工作,因此它与for循环相似并且易于理解。

请记住,您可以在回调的$(this)部分打印出每个子元素。这在您需要确定需要作为标记放置的特定事物的情况下特别有用。例如:

$('span.pitcher-name').each(function(i, element){

    //Return the entire element.
    var pitcherNameElement = $(this);

    //Prints all of the element's properties.
    console.log(pitcherNameElement); 

});

现在,为了检索更抽象的东西,比如同一个表行中的项目数组,事情变得稍微复杂一些。为了做到这一点,我们需要在表行上使用$()。每个函数,然后检查每个子类的匹配项。这样,我们可以使用相同的索引。

$('tr').each(function(i, element){

    //get all children of a table row
    var children = $(this)['0'].children;

    //this array will hold the matchup data
    var matchupArr = [];

    //class to extract
    var statClass = 'stat-stat-width';

    //for loop-ing the children
    for(var myInt=0; myInt<children.length; myInt++){

        //the next element of this child
        var next = children[myInt].next;

        //sometimes next is undefined
        if(next != undefined){

            //get the html attribs of the next element
            var attribs = next.attribs;

            //sometimes the next element has no attribs
            if(attribs != undefined){

                //class of the next element
                var myClass = attribs.class;

                //if the next element's class if the one we want
                if(myClass == statClass){

                    //push it to our matchup array
                    matchupArr.push(next.children[0].data);
                };
            };
        };
    };

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the matchup property of our object with our array.
    if(matchupArr.length >0){
        someObjArr[i].matchups = matchupArr;
    };
});

这有点像黑客,但它显示了潜在的概念。允许您在父P中对所有子C执行回调的方法将是对库的一个很好的补充。但是,唉,我们生活在一个不完美的世界。

祝你好运,快乐刮刮!

答案 1 :(得分:0)

你见过the documentation吗?如果你失败了,有很多关于如何遍历网站元素的例子。

例如:

$('#span.pitcher-name').next() //{['<small class="text-muted pitches">(R)</small>']}