我正在使用JSOUP API从Java程序的html页面读取数据 我正在通过ElementByTag(“ H3”)获取值,在这里我都得到了该标签的值 不是问题是我不希望我在Outlook列表值中获得子标记值
用于读取值的Java代码
var clusters = Enumerable
.Range(1, 10)
.Select(_ => new RecurringClusterModel
{
REC_Cluster_1 = _Locations[_Random.Next(_Locations.Count)],
REC_Cluster_2 = _Locations[_Random.Next(_Locations.Count)],
REC_Cluster_3 = _Locations[_Random.Next(_Locations.Count)],
})
.ToList();
var dictionary = clusters
// Flatten the list and preserve original object
.SelectMany(model => model.Clusters.Select(cluster => (cluster, model)))
// Group by flattened value and put original object into each group
.GroupBy(node => node.cluster, node => node.model)
// Take only groups with more than one element (duplicates)
.Where(group => group.Skip(1).Any())
// Depending on further processing you could put the groups into a dictionary.
.ToDictionary(group => group.Key, group => group.ToList());
foreach (var cluster in dictionary)
{
Console.WriteLine(cluster.Key);
foreach (var item in cluster.Value)
{
Console.WriteLine(" " + String.Join(", ", item.Clusters));
}
}
输入HTML代码为
Element instrumentContent = doc.select("div.comp-fs-instrument-content").get(i);
if (null != instrumentContent) {
Elements elementsByTag = instrumentContent.getElementsByTag("LI");
Elements instrumentCategory = elementsByTag.get(0).getElementsByTag("H4");
Elements Ratings = elementsByTag.get(1).getElementsByTag("H4");
Elements Outlook = elementsByTag.get(2).getElementsByTag("H4");
System.out.println("Outlook======" + Outlook);
strInstrument = Optional.ofNullable(instrumentCategory).filter(s -> !s.isEmpty())
.map(s -> s.first().html()).orElse("-");
strRating = Optional.ofNullable(Ratings).filter(s -> !s.isEmpty()).map(s -> s.first().html())
.orElse("-");
strOutlook = Optional.ofNullable(Outlook).filter(s -> !s.isEmpty()).map(s -> s.first().parent().html())
.orElse("-");
}
我得到的输出 ......
<div class="comp-fs-instrument-content">
<ul class="clearfix">
<li prid="164910"> <span>Instrument Category</span> <h4>Long Term</h4> </li>
<li> <span>Ratings</span> <h4>CRISIL B- (Issuer Not Cooperating)</h4> </li>
<!-- Updated on 5th May start-->
<li class="third-col"> <span>Outlook</span> <h4>Stable <span> as of October 24, 2018</span></h4> </li>
<!-- Updated on 5th May ends-->
<li class="view-instr-btn text-center"> <a href="javascript:;" class="view-instr-button">View Instrument</a> </li>
</ul>
</div>
我需要这样
Stable <span> as of October 23, 2018</span>