使用漂亮的汤从网络抓取中获取高度数据到列表中

时间:2018-10-01 03:26:45

标签: python web-scraping beautifulsoup web-crawler

我想尝试使用漂亮的汤和请求从该网站http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights获取数据。这是我的代码:

library(tidyverse)
sums <- group_by(data, data$Management) %>% colSums(data[,(2:31)], na.rm = TRUE)

当您打印import requests from bs4 import BeautifulSoup response = requests.get("http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights") soup = BeautifulSoup(response.text, "html.parser") list_table_data = soup.find(class_="wikitable").contents list_tr_data = list_table_data[1::2] print(list_tr_data) 时,输出将变为:

list_tr_data

我希望将此Height(Inches)数据放入名为[<tr> <th>Index</th><th>Height(Inches)</th><th>Weight(Pounds) </th></tr>, <tr> <td>1</td><td>65.78</td><td>112.99 </td></tr>, <tr> <td>2</td><td>71.52</td><td>136.49 </td></tr>, <tr> <td>3</td><td>69.40</td><td>153.03 </td></tr>,...., <tr> <td>200</td><td>71.39</td><td>127.88 </td></tr>] 的列表中,但是当我尝试使用此代码进行访问时:

list_height_data

这将导致一个空列表:

list_height_data = []
for row in list_tr_data:
    list_height_data.append(row.find_all("tr"))
print(list_height_data)

我该怎么做才能获取身高(英寸)数据?如果您打印[[], [], [], [], [], [], [], [], [], [], ... []] 并打印list_height_data,则应成为:

len(list_height_data)

1 个答案:

答案 0 :(得分:2)

您需要遍历td标签:

import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights').text, 'html.parser')
_, *results = [[float(c.text.replace('\n', '')) for c in i.find_all('td')] for i in d.find('table', {'class':'wikitable'}).find_all('tr')]
height = [i[1] for i in results]

输出:

[65.78, 71.52, 69.4, 68.22, 67.79, 68.7, 69.8, 70.01, 67.9, 66.78, 66.49, 67.62, 68.3, 67.12, 68.28, 71.09, 66.46, 68.65, 71.23, 67.13, 67.83, 68.88, 63.48, 68.42, 67.63, 67.21, 70.84, 67.49, 66.53, 65.44, 69.52, 65.81, 67.82, 70.6, 71.8, 69.21, 66.8, 67.66, 67.81, 64.05, 68.57, 65.18, 69.66, 67.97, 65.98, 68.67, 66.88, 67.7, 69.82, 69.09, 69.91, 67.33, 70.27, 69.1, 65.38, 70.18, 70.41, 66.54, 66.36, 67.54, 66.5, 69.0, 68.3, 67.01, 70.81, 68.22, 69.06, 67.73, 67.22, 67.37, 65.27, 70.84, 69.92, 64.29, 68.25, 66.36, 68.36, 65.48, 69.72, 67.73, 68.64, 66.78, 70.05, 66.28, 69.2, 69.13, 67.36, 70.09, 70.18, 68.23, 68.13, 70.24, 71.49, 69.2, 70.06, 70.56, 66.29, 63.43, 66.77, 68.89, 64.87, 67.09, 68.35, 65.61, 67.76, 68.02, 67.66, 66.31, 69.44, 63.84, 67.72, 70.05, 70.19, 65.95, 70.01, 68.61, 68.81, 69.76, 65.46, 68.83, 65.8, 67.21, 69.42, 68.94, 67.94, 65.63, 66.5, 67.93, 68.89, 70.24, 68.27, 71.23, 69.1, 64.4, 71.1, 68.22, 65.92, 67.44, 73.9, 69.98, 69.52, 65.18, 68.01, 68.34, 65.18, 68.26, 68.57, 64.5, 68.71, 68.89, 69.54, 67.4, 66.48, 66.01, 72.44, 64.13, 70.98, 67.5, 72.02, 65.31, 67.08, 64.39, 69.37, 68.38, 65.31, 67.14, 68.39, 66.29, 67.19, 65.99, 69.43, 67.97, 67.76, 65.28, 73.83, 66.81, 66.89, 65.74, 65.98, 66.58, 67.11, 65.87, 66.78, 68.74, 66.23, 65.96, 68.58, 66.59, 66.97, 68.08, 70.19, 65.52, 67.46, 67.41, 69.66, 65.8, 66.11, 68.24, 68.02, 71.39]