我正在尝试收集以下兄弟姐妹,直到某个兄弟姐妹,但是我仍然不知道该怎么做,我尝试使用类名在兄弟姐妹之前和之后进行查找,但结果却是错误的
我的html是:
<div class="MainClass">
<div class="InfoClass">
<div class="left-wrap">
<span class="date">2 August 2020</span>
</div>
</div>
<div class="DataClass">
<em class="Code">
<span>1</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>2</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>3</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>4</span>
</em>
</div>
<div class="InfoClass">
<div class="left-wrap">
<span class="date">15 August 2020</span>
</div>
</div>
<div class="DataClass">
<em class="Code">
<span>5</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>6</span>
</em>
</div>
</div>
这是我的Python代码:
mainClass = driver.find_elements_by_xpath("//div[@class='MainClass']//following-sibling::div[@class='InfoClass']")
for mc in mainClass:
kDate = header.find_element_by_xpath(".//span[@class='date']").text
print(kDate)
datarows = header.find_elements_by_xpath("following-sibling::div[@class='DataClass' and preceding-sibling::div[@class='DataClass']]")
for datarow in datarows:
mc = datarow.find_element_by_xpath(".//em[@class='Code']").text
print("Code : "+mc)
我得到的结果:
2 August 2020
2
3
4
5
6
15 August 2020
5
6
作为结果,我想要的是按日期分组的“代码”类:
2 August 2020
1
2
3
4
15 August 2020
5
6
答案 0 :(得分:2)
关于您的预期输出,为什么不从所有span元素中提取文本,因为它们已经按顺序排列了?例如,使用LXML:
td1
输出:
data=tree.xpath("//span/text()")
print(*data, sep="\n")
如果您真的想使用循环并创建字典,这是一个建议。首先,数据:
2 August 2020
1
2
3
4
15 August 2020
5
6
然后输入代码:
data = """<div class="MainClass">
<div class="InfoClass">
<div class="left-wrap">
<span class="date">2 August 2020</span>
</div>
</div>
<div class="DataClass">
<em class="Code">
<span>1</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>2</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>3</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>4</span>
</em>
</div>
<div class="InfoClass">
<div class="left-wrap">
<span class="date">15 August 2020</span>
</div>
</div>
<div class="DataClass">
<em class="Code">
<span>5</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>6</span>
</em>
</div>
</div>"""
评论:
首先,将日期提取到列表中。然后,所有人都依赖以下 XPath (您正在寻找的那个?)来获取相应的数据类:
import lxml.html
tree = lxml.html.fromstring(data)
dates = [el.text for el in tree.xpath("//span[@class='date']")]
print(dates)
dc=[]
for els in dates:
lists=[el.text for el in tree.xpath("//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]")]
dc.append(lists)
print(dc)
dictionary = dict(zip(dates,dc))
print(dictionary)
//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]
是先前获取的日期。
最后,构造字典。该代码是为+els+
编写的。只需将LXML
替换为Selenium等价物(tree.xpath
)即可使其起作用。
输出(日期,数据类,字典):
driver.find_elements_by_xpath
编辑:如果需要打印字典,可以使用:
['2 August 2020', '15 August 2020']
[['1', '2', '3', '4'], ['5', '6']]
{'2 August 2020': ['1', '2', '3', '4'], '15 August 2020': ['5', '6']}
根据要求输出:
for keys,values in dictionary.items():
print(keys)
print(*values,sep='\n')
答案 1 :(得分:1)
因为所有包含日期和数据的div在MainClass div下都处于同一级别。对于包含日期和数据的所有范围,我们都可以使用一个通用的xpath来获得理想的结果。
driver = webdriver.Chrome()
driver.get("https://bilalzamel.htmlsave.net/")
mainClass = driver.find_elements_by_xpath("//div[@class='MainClass']//span")
for mc in mainClass:
kDate = mc.text
print(kDate)
答案 2 :(得分:1)
我找到了一种显示所需文本的方法。
mainClassText = driver.find_element_by_xpath("//div[@class='MainClass']").text
print(mainClassText)
如果您愿意,也可以将其转换为列表。
mainClassTextList = mainClassText.split("\n")
for ele in mainClassTextList:
print(ele)
在两种情况下都会显示:
2 August 2020
1
2
3
4
15 August 2020
5
6
答案 3 :(得分:1)
您可以使用与上一个问题相同的简单代码,但是如果$duplicate_array = $a;
shuffle($duplicate_array);
$combined = array_merge($duplicate_array, $a);
$combined = array_chunk($combined,2);
不是唯一的,则可以使用list
来收集正确的值。如果 2020年8月2日和 2020年8月15日相同,.Code
code
输出:
codes = list()
for e in driver.find_elements_by_class_name('Code'):
code = e.text
date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
codes.append({"date": date, "code": code})
for c in codes:
print(f'date: {c["date"]}, code: {c["code"]}')
如果您要使用日期作为键并将值编码为值的dict:
date: 2 August 2020, code: 1
date: 2 August 2020, code: 2
date: 2 August 2020, code: 3
date: 2 August 2020, code: 4
date: 15 August 2020, code: 5
date: 15 August 2020, code: 6
有输出:
codes = dict()
for e in driver.find_elements_by_class_name('Code'):
code = e.text
date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
if date in codes:
codes[date].append(code)
else:
codes.update({date: [code]})
for k, v in codes.items():
print(f'{k} : {v}')