这是代码和示例结果,我只想要忽略其余部分的表的第一列。请帮忙。 Stackoverflow上有类似的问题,但它们没有帮助。
<tr>
<td>JOHNSON</td>
<td> 2,014,470 </td>
<td>0.81</td>
<td>2</td>
</tr>
I want JOHNSON only, as it is the first child.
My python code is :
import requests
from bs4 import BeautifulSoup
def find_raw():
url = 'http://names.mongabay.com/most_common_surnames.htm'
r = requests.get(url)
html = r.content
soup = BeautifulSoup(html)
for n in soup.find_all('tr'):
print n.text
find_raw()
What I get:
SMITH 2,501,922 1.0061
JOHNSON 2,014,470 0.812
答案 0 :(得分:3)
您可以使用tr
找到所有find_all
代码,然后为每个tr
find
找到td
(仅提供第一个)for tr in soup.find_all('tr'):
td = tr.find('td')
if td:
print td
。如果存在,则打印出来:
void shuffle_array(int* array, const int size){
/* given an array of size size, this is going to randomly
* attribute a number from 0 to size-1 to each of the
* array's elements; the numbers don't repeat */
int i, j, r;
bool in_list;
for(i = 0; i < size; i++){
in_list = 0;
r = mt_lrand() % size; // my RNG function
for(j = 0; j < size; j++)
if(array[j] == r){
in_list = 1;
break;
}
if(!in_list)
array[i] = r;
else
i--;
}
}
答案 1 :(得分:2)
Iter到tr,然后打印第一个td的文本:
for tr in bs4.BeautifulSoup(data).select('tr'):
try:
print tr.select('td')[0].text
except:
pass
或更短:
>>> [tr.td for tr in bs4.BeautifulSoup(data).select('tr') if tr.td]
[<td>SMITH</td>, <td>JOHNSON</td>, <td>WILLIAMS</td>, <td>JONES</td>, ...]
相关帖子: