从结果表中分割并合并了节点文本?

时间:2018-07-09 15:10:20

标签: python selenium-webdriver xpath webdriver

如何添加“;”划分并合并行中的以下文本? ..保持原有的编程结构。

分开

在标签文本“ HammarbyvsOstersunds ”中,我想分为 Hammarby Ostersunds ;。

组合:

在标签文本“ 期望 In 播放开始销售; 时间:”到预计在Play中开始销售的时间:

# -*- coding:UTF-8 -*-
import sys
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("url")

table = ['; '.join(["; ".join( j.text.split(" ")) for j in i.find_elements_by_class_name('couponRow') if j.text]) for i in driver.find_elements_by_xpath('//*[@id="todds"]//div[@class="couponTable"]') if i.text]

for line in table:
    print line
driver.close()

输出:

Monday;Matches
MON;41;HammarbyvsOstersunds;Expected;In;Play;start;selling;time:
10/07;01:00;1.85;3.50;3.35
Tuesday;Matches
TUE;1;FrancevsBelgium;Expected;In;Play;start;selling;time:
11/07;02:00;2.38;2.82;2.95
Wednesday;Matches
WED;1;CroatiavsEngland;Expected;In;Play;start;selling;time:
12/07;02:00;3.45;2.80;2.15

预期结果:

Monday; Matches;
MON;41;Hammarby; vs; Ostersunds;Expected In Play start selling time: ; 10/07;01:00;1.85;3.50;3.35
Tuesday; Matches;
TUE;1;France;vs;Belgium; Expected In Play start selling time:;11/07;02:00;2.38;2.82;2.95
Wednesday;Matches
WED;1;Croatia;vs;England;Expected In Play start selling time: ;12/07;02:00;3.45;2.80;2.15

1 个答案:

答案 0 :(得分:1)

正如已经建议的那样,您应该将任务拆分为较小的可管理代码段。

我对您的代码进行了一些尝试,但是很难从中获得完美的结果。这是我得到的:

table = ['; '.join([" ".join( j.text.split(" ")) for j in i.find_elements_by_class_name('couponRow') if j.text]) for i in driver.find_elements_by_xpath('//*[@id="todds"]//div[@class="couponTable"]') if i.text]
lines = [' ; '.join(t.split('\n')) for t in table]
result = [re.sub(r"([A-Z]\w+)vs([A-Z]\w+)", r'; \1 vs \2;', l, 0, re.MULTILINE) for l in lines]

结果:

['Tuesday Matches; TUE 1 ; France vs Belgium; Expected In Play start selling time: ; 11/07 02:00 --- --- ---',
 'Wednesday Matches; WED 1 ; Croatia vs England; Expected In Play start selling time: ; 12/07 02:00 --- --- ---']

不太好,但可能还不错。

主要问题是您的选择器有点宽,会选择原始文本。
您可以尝试将您真正感兴趣的节点归零,例如

teams = [team.text for team in driver.find_elements_by_xpath('//div[@class="cteams"]/span/span[@class="teamname"]')]

..等等