使用python和xpath选择多个值

时间:2017-02-24 00:42:25

标签: python xpath lxml

我可以毫无问题地在python中使用xpath选择单个值,但是如何连接几个单个xpath来获取一个? 这是html源代码的示例片段(r.content):

<div class="members">
    <h2>Members</h2>
    <div class="member">
        <span title="Last Online:&nbsp;2017-02-20 22:37:42" data-time="2017-02-20T22:37:42Z">
          <span class="profile-link">
            <a href="/account/view-profile/KonterBolet">
              <img class="achievement" src="36.png" alt="Completed 36" title="Completed 36">KonterA</a>
          </span>
          <span class="memberType">Leader</span>
        </span>
    </div>
    <div class="member">
        <span title="Last Online:&nbsp;2017-02-19 11:28:20" data-time="2017-02-19T11:28:20Z">
          <span class="profile-link hasTwitch twitchOffline" data-twitch-user="mardok_tv">
            <a href="/account/view-profile/mardok">
              <img class="achievement" src="35.png" alt="Completed 35" title="Completed 35">mardok</a>
            <a class="twitch" href="//www.twitch.tv/mardok_tv" target="_blank" title="Offline"></a>
          </span>
          <span class="memberType">Officer</span>
        </span>
    </div>
</div>

我使用python requests来获取内容,使用lxml来解析它

import requests
from lxml import html
ses = requests.session()
r = ses.get(SITE_URL)
webContent = html.fromstring(r.content)

第一个xpath:
acc = webContent.xpath("//span/a[contains(@href,'account/view-profile')]/text()")
结果:
['konterA', 'mardok']

第二个xpath:
twitch = webContent.xpath('//span/@data-twith-user')
结果:
['mardok_tv']

第三个xpath:
lastOnline = webContent.xpath('//span/@data-time')
结果:
['2017-02-20T22:37:42Z','2017-02-19T11:28:20Z']

如何将这三个结合起来得到这样的结果:
[['konterA','','2017-02-20T22:37:42Z'],['mardok','mardok_tv','2017-02-19T11:28:20Z']

1 个答案:

答案 0 :(得分:0)

让他们称呼他们first_list, second_list and third_list。 将second_list修改为:

second_list = [ i if i.strip("_tv") in first_list else "" for i in second_list ]
之后,请执行:

 zip(first_list, second_list, third_list)

这应该以相同的方式给你一个元组列表。

[('konterA','','2017-02-20T22:37:42Z'),('mardok','mardok_tv','2017-02-19T11:28:20Z')]