如何从urllib2中获取python中url的特定标记数据

时间:2017-06-16 08:35:13

标签: python-2.7 urllib2

我对python 2.7很新,我有一个任务是读取URL中的表。

我从表中获取URL中的数据。现在的问题是,我只需要数据,但我也得到了标签。 请帮我。提前谢谢。

from bs4 import BeautifulSoup
import urllib2


    response = urllib2.urlopen('https://www.somewebsite.com/')
    html = response.read()
    soup = BeautifulSoup(html)

    tabulka = soup.find("table", {"class" : "defaultTableStyle tableFontMD tableNoBorder"})



    records = [] 
    for row in tabulka.findAll('tr'):
        col = row.findAll('td')

        print col 

1 个答案:

答案 0 :(得分:3)

您必须使用.text属性

from bs4 import BeautifulSoup
import urllib2


response = urllib2.urlopen('https://www.somewebsite.com/')
html = response.read()
soup = BeautifulSoup(html)

tabulka = soup.find("table", {"class" : "defaultTableStyle tableFontMD tableNoBorder"})



records = [] 
for row in tabulka.findAll('tr'):
    col = row.findAll('td')

    print [coli.text for coli in col]