Question

所以我刚开始使用美味的汤4，我遇到了一个问题，我已经尝试解决了几天，但我不能。我先粘贴我要分析的html代码：

<table class="table table-condensed table-hover tenlaces tablesorter">
<thead>
<tr>
<th class="al">Language</th>
<th class="ac">Link</th>
</tr>
</thead>
<tbody>


            <tr>
            <td class="tdidioma"><span class="flag flag_0">0</span></td>
            <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="Ver..." href="LINK I WANT TO SAVE0"><i class="icon-play"></i>&nbsp;&nbsp;Ver</a></td>
            </tr>

            <tr>
            <td class="tdidioma"><span class="flag flag_1">1</span></td>
            <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="Ver..." href="LINK I WANT TO SAVE1"><i class="icon-play"></i>&nbsp;&nbsp;Ver</a></td>
            </tr>

            <tr>
            <td class="tdidioma"><span class="flag flag_2">2</span></td>
            <td class="tdenlace"><a class="btn btn-mini enlace_link" data-servidor="42" rel="nofollow" target="_blank" title="Ver..." href="LINK I WANT TO SAVE2"><i class="icon-play"></i>&nbsp;&nbsp;Ver</a></td>
            </tr>
</tbody>
</table>

正如你在每个＆lt; tr>有＆lt; td＆gt;语言和链接。问题是我不知道如何将语言与链接联系起来。我的意思是，我想选择例如语言中的空格是否为1返回链接。如果没有，不要做任何事情。但我只能返回＆lt; td＆gt;用语言，而不是所有＆lt; tr>这是重要的思考我不知道我是否说出了自己的观点，因为我不知道如何解释

我现在拥有的代码＆lt; tbody＆gt;从我的主要网址，但我真的不知道怎么做我要问的。

谢谢，抱歉我的英语不好！

编辑：以下是我的代码示例，以便您了解我使用的库以及所有

from bs4 import BeautifulSoup
import urllib2

url = raw_input("Introduce URL to analyse: ")
page = urllib2.urlopen(url)
soup = Beautifulsoup(page.read())
body = soup.tbody
#HERE SHOULD BE WHAT I DON'T KNOW HOW TO DO
page.close()

Answer 1

尝试这样的事情：

result = None
for row in soup.tbody.find_all('tr'):
    lang, link = row.find_all('td')
    if lang.string == '1':
        result = link.a['href']
print result

Answer 2

尝试使用这样的汤，可能你需要一些异常处理

trs = soup.select('tr') # here trs is a list of bs4.element.Tag type element

现在遍历列表，

for itm in trs:
    tds = itm.select('td')
    if tds:
        tdidoma, tdenlace = tds[0], tds[1] #assuming evey tr tag has atleast 2 td tags 
        print tdidoma.string
        print tdenlace.a['href']

Answer 3

我假设您要检查网址是否包含1并保存，如果确实如此。这是你想要的吗？

您可以尝试使用此代码：

soup = BeautifulSoup(YOUR_TEXT_HERE)
tbody_soup = soup.find('tbody')
links = tbody_soup.find_all('a')
links_to_save = []

for item in links:
    print item.attrs['href'] # prints the url
    print item.get_text() # prints the text of the link
    print item.attrs # prints a dictionary with all the attributes

    # check if 1 is in url?
    if '1' in item.attrs['href']:
        links_to_save.append(item.attrs['href'])

print links_to_save

网刮与美丽的汤4蟒蛇

3 个答案: