当从网站检索数据时,列表返回为空;蟒蛇

时间:2016-01-05 01:51:24

标签: python list parsing urllib3

我试图通过将数据插入列表来解析网站中的数据,但列表又回来了。

url =("http://www.releasechimps.org/resources/publication/whos-there-md-  anderson")
           http = urllib3.PoolManager()
r = http.request('Get',url)
soup = BeautifulSoup(r.data,"html.parser")
#print(r.data)
loop = re.findall(r'<td>(.*?)</td>',str(r.data))
#print(str(loop))
newLoop = str(loop)
#print(newLoop)
for x in range(1229):
    if "\\n\\t\\t\\t\\t" in loop[x]:
        loop[x] = loop[x].replace("\\n\\t\\t\\t\\t","")
        list0_v2.append(str(loop[x]))
        print(loop[x])
print(str(list0_v2))

1 个答案:

答案 0 :(得分:0)

编辑:没有其他任何事情发生,所以我把你的数据格式变成了很好的词典列表。猴子111上有一个奇怪的<td height="26">,所以我不得不稍微改变正则表达式。

希望这对你有帮助,我这样做是因为我关心猴子男人。

import html
import re
import urllib.request

list0_v2 = []
final_list = []

url = "http://www.releasechimps.org/resources/publication/whos-there-md-anderson"
data = urllib.request.urlopen(url).read()
loop = re.findall(r'<td.*?>(.*?)</td>', str(data))

for item in loop:
    if "\\n\\t\\t\\t\\t" or "em>" in item:
        item = item.replace("\\n\\t\\t\\t\\t", "").replace("<em>", "")\
        .replace("</em>", "")
    if "&nbsp;" == item:
        continue
    list0_v2.append(item)

n = 1
while len(list0_v2) != 0:
    form = {"n":0, "name":"", "id":"", "gender":"", "birthdate":"", "notes":""}

    try:
        if list0_v2[5][-1] == '.':
            numb, name, ids, gender, birthdate, notes = list0_v2[0:6]
            form["notes"] = notes
            del(list0_v2[0:6])
        else:
            raise Exception('foo')
    except:
        numb, name, ids, gender, birthdate = list0_v2[0:5]
        del(list0_v2[0:5])

    form["n"] = int(numb)
    form["name"] = html.unescape(name)
    form["id"] = ids
    form["gender"] = gender
    form["birthdate"] = birthdate

    final_list.append(form)
    n += 1

for li in final_list:
    print("{:3} {:10} {:10} {:3} {:10} {}".format(li["n"], li["name"], li["id"],\
    li["gender"], li["birthdate"], li["notes"]))