通过getElementByTag导航文档的DOM方法

时间:2019-04-17 11:58:07

标签: javascript java html jsoup element

我正在使用JSOUP API从Java程序的html页面读取数据 我正在通过ElementByTag(“ H3”)获取值,在这里我都得到了该标签的值 不是问题是我不希望我在Outlook列表值中获得子标记值

用于读取值的Java代码

var clusters = Enumerable
        .Range(1, 10)
        .Select(_ => new RecurringClusterModel
        {
            REC_Cluster_1 = _Locations[_Random.Next(_Locations.Count)],
            REC_Cluster_2 = _Locations[_Random.Next(_Locations.Count)],
            REC_Cluster_3 = _Locations[_Random.Next(_Locations.Count)],
        })
        .ToList();

var dictionary = clusters
    // Flatten the list and preserve original object
    .SelectMany(model => model.Clusters.Select(cluster => (cluster, model)))
    // Group by flattened value and put original object into each group
    .GroupBy(node => node.cluster, node => node.model)
    // Take only groups with more than one element (duplicates)
    .Where(group => group.Skip(1).Any())
    // Depending on further processing you could put the groups into a dictionary.
    .ToDictionary(group => group.Key, group => group.ToList());

foreach (var cluster in dictionary)
{
    Console.WriteLine(cluster.Key);

    foreach (var item in cluster.Value)
    {
        Console.WriteLine("   " + String.Join(", ", item.Clusters));
    }
}

输入HTML代码为

Element instrumentContent = doc.select("div.comp-fs-instrument-content").get(i);
                if (null != instrumentContent) {

                    Elements elementsByTag = instrumentContent.getElementsByTag("LI");
                    Elements instrumentCategory = elementsByTag.get(0).getElementsByTag("H4");
                    Elements Ratings = elementsByTag.get(1).getElementsByTag("H4");
                    Elements Outlook = elementsByTag.get(2).getElementsByTag("H4");
                    System.out.println("Outlook======" + Outlook);

                    strInstrument = Optional.ofNullable(instrumentCategory).filter(s -> !s.isEmpty())
                            .map(s -> s.first().html()).orElse("-");
                    strRating = Optional.ofNullable(Ratings).filter(s -> !s.isEmpty()).map(s -> s.first().html())
                            .orElse("-");

                    strOutlook = Optional.ofNullable(Outlook).filter(s -> !s.isEmpty()).map(s -> s.first().parent().html())
                            .orElse("-");

                }

我得到的输出 ......

<div class="comp-fs-instrument-content"> 
            <ul class="clearfix"> 
             <li prid="164910"> <span>Instrument Category</span> <h4>Long Term</h4> </li> 
             <li> <span>Ratings</span> <h4>CRISIL B- (Issuer Not Cooperating)</h4> </li> 
             <!-- Updated on 5th May start--> 
             <li class="third-col"> <span>Outlook</span> <h4>Stable <span> as of October 24, 2018</span></h4> </li> 
             <!-- Updated on 5th May ends--> 
             <li class="view-instr-btn text-center"> <a href="javascript:;" class="view-instr-button">View Instrument</a> </li> 
            </ul> 
           </div> 

我需要这样

Stable <span> as of October 23, 2018</span>

0 个答案:

没有答案