以下代码是简化的html代码。
<html>
...
<div class="info">
<span class="time">2017.01.16</span>
</div>
<div class="related_group">
<ul class="related_list">
<li>
<p class="info">
<span class="time">2016.12.28</span>
</p>
</li>
</ul>
</div>
<div class="info">
<span class="time">2017.01.26</span>
</div>
<div class="related_group">
<ul class="related_list">
<li>
<p class="info">
<span class="time">2017.01.30</span>
</p>
</li>
</ul>
</div>
...
</html>
这种模式重复了很多次,我希望得到像这样的数据 2017.01.16 和 2017.01.26
所以我在python中使用了Beautiful Soup。
for item in soup.find_all("span", {"class" : "time"}):
source=source+str(item.find_all(text=True))
此代码可查找日期数据,但也会找到无用的数据 2016.12.28 和 2017.01.30
为了获得更精确的结果,我尝试了 find_next_siblings
for item in soup.find_next_siblings("span", {"class" : "time"}):
source=source+str(item.find_next_siblings())
你可能知道,它不起作用。 当然,我搜索参考并阅读它。 我不能理解因为缺乏英语.. 如果你不介意,你能帮我解决一下代码吗?
答案 0 :(得分:1)
试试这个:
from bs4 import BeautifulSoup
html=""" <html>
<div class="info">
<span class="time">2017.01.16</span>
</div>
<div class="related_group">
<ul class="related_list">
<li>
<p class="info">
<span class="time">2016.12.28</span>
</p>
</li>
</ul>
</div>
<div class="info">
<span class="time">2017.01.26</span>
</div>
<div class="related_group">
<ul class="related_list">
<li>
<p class="info>
<span class="time">2017.01.30</span>
</p>
</li>
</ul>
</div>
</html>"""
soup = BeautifulSoup(html)
s = soup.find_all('div', class_=['info', 'related_group'])
s = iter(s)
for a in s:
print a.text.strip(), '---', next(s).text.strip()
输出:
2017.01.16 --- 2016.12.28
2017.01.26 --- 2017.01.30
答案 1 :(得分:0)
public class DTO1
{
public int Id { get; set; }
public string Name { get; set; }
}
public class DTO2
{
public int Id { get; set; }
public string Name { get; set; }
}
public class DTO1Service
{
public static List<DTO1> GetListOfDTO1()
{
return new List<DTO1>
{
new DTO1 { Id = 1, Name = "DTO 1" },
new DTO1 { Id = 2, Name = "DTO 2" }
};
}
}
public class DTO2Service
{
public static List<DTO2> GetListOfDTO2()
{
return new List<DTO2>
{
new DTO2 { Id = 1, Name = "DTO 1" },
new DTO2 { Id = 2, Name = "DTO 2" }
};
}
}
public class Program
{
public static void Main(string[] args)
{
var entities = new List<dynamic>();
var serviceType = Console.ReadLine();
if(serviceType == "1")
entities = (dynamic)DTO1Service.GetListOfDTO1();
else if (serviceType == "2")
entities = (dynamic)DTO2Service.GetListOfDTO2();
Console.ReadLine();
}
}
出:
soup.find_all('div', class_='info')
您想要的标记位于[<div class="info">
<span class="time">2017.01.16</span>
</div>, <div class="info">
<span class="time">2017.01.26</span>
</div>]
标记下。
答案 2 :(得分:0)
这个怎么样:
times = []
items = soup.find_all('div', {"class" : "info"})
for item in items:
tmp = item.select(".time")
t = tmp[0].text
times.append(t)