使用lxml检索类属性的名称

时间:2016-01-25 20:08:27

标签: python html lxml

我正在使用lxml编写python项目来删除页面,我遇到了检索span类属性名称的挑战。 html片段如下:

ExampleSubclass

如何在下面检索span的class属性的值:

<tr class="nogrid">
  <td class="date">12th January 2016</td> 
  <td class="time">11:22pm</td> 
  <td class="category">Clothing</td>   
  <td class="product">
    <span class="brand">carlos santos</span>
  </td> 
  <td class="size">10</td> 
  <td class="name">polo</td> 
</tr>
....

2 个答案:

答案 0 :(得分:4)

您可以使用以下XPath获取class元素的span属性,该td元素是product的直接子级,属于//td[@class="product"]/span/@class 类:

from lxml import html
raw = '''<tr class="nogrid">
<td class="date">12th January 2016</td> 
<td class="time">11:22pm</td> 
<td class="category">Clothing</td>   
<td class="product">
<span class="brand">carlos santos</span>
</td> 
<td class="size">10</td> 
<td class="name">polo</td> 
</tr>'''

root = html.fromstring(raw)
span = root.xpath('//td[@class="product"]/span/@class')[0]
print span

工作演示示例:

Brand

输出

#include <stdio.h>
#include <string.h>

int main( void )
{    
    const char *s = "PO-ELK=SAEER:SWE";
    const char *t = "-=:";
    size_t n = 0;

    if ( s[n = strcspn( s + n, t )] == t[0] &&
         s[n += 1 + strcspn( s + n + 1, t )] == t[1] &&
         s[n += 1 + strcspn( s + n + 1, t )] == t[2] )
    {
        printf( "\"%s\" is a valid string\n", s );
    }
    else
    {
        printf( "\"%s\" is not a valid string\n", s );
    }

    s = "PO-ELK:SAEER=SWE";
    n = 0;

    if ( s[n = strcspn( s + n, t )] == t[0] &&
         s[n += 1 + strcspn( s + n + 1, t )] == t[1] &&
         s[n += 1 + strcspn( s + n + 1, t )] == t[2] )
    {
        printf( "\"%s\" is a valid string\n", s );
    }
    else
    {
        printf( "\"%s\" is not a valid string\n", s );
    }

    return 0;
}            

答案 1 :(得分:1)

from bs4 import BeautifulSoup

lxml = '''<tr class="nogrid">
          <td class="date">12th January 2016</td> 
          <td class="time">11:22pm</td> 
          <td class="category">Clothing</td>   
          <td class="product">
            <span class="brand">carlos santos</span>
          </td> 
          <td class="size">10</td> 
          <td class="name">polo</td> 
          <tr>'''
soup = BeautifulSoup(lxml, 'lxml')
result = soup.find('span')['class'] # result = 'brand'