Question

我使用此代码

import urllib.request
fp = urllib.request.urlopen("https://english-thai-dictionary.com/dictionary/?sa=all")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
x = 'alt'
for item in mystr.split():
    if (x) in item:
        print(item.strip())

我从这段代码中得到了泰语单词，但我不知道如何获得英语单词。谢谢

Answer 1

如果要从表中获取单词，则应使用诸如BeautifulSoup4之类的解析库。这是一个示例，您可以如何解析它（我正在使用requests来获取，beautifulsoup在这里来解析数据）：

首先在浏览器中使用开发工具，将要解析的内容与表一起识别。带有翻译的表格具有servicesT类属性，该属性在整个文档中仅出现一次：

import requests
from bs4 import BeautifulSoup

url = 'https://english-thai-dictionary.com/dictionary/?sa=all;ftlang=then'

response = requests.get(url)

soup = BeautifulSoup(response.text, 'lxml')


# Get table with translations
table = soup.find('table', {'class':'servicesT'})

此后，您需要获取所有包含泰语单词翻译的行。如果您查找页面的源文件，您会注意到前<tr行是仅包含标题的标题，因此我们将其省略。之后，我们将从行中获取所有<td>个元素（在该表中始终有3个<td>元素）并从中获取单词（在该表中单词实际上嵌套在和中）。

table_rows = table.findAll('tr') 
# We will skip first 3 rows beacause those are not
# contain information we need
for tr in table_rows[3:]:
    # Finding all <td> elements
    row_columns = tr.findAll('td')
    if len(row_columns) >= 2:
        # Get tag with Thai word
        thai_word_tag = row_columns[0].select_one('span > a')
        # Get tag with English word
        english_word_tag = row_columns[1].find('span')
        if thai_word_tag:
            thai_word = thai_word_tag.text
        if english_word_tag:
            english_word = english_word_tag.text
        # Printing our fetched  words
        print((thai_word, english_word))

当然，这是我从页面中解析出的内容的非常基本的示例，您应该自己决定要删除的内容。我还注意到表中的数据一直没有翻译，因此在抓取数据时应牢记这一点。您还可以使用Requests-HTML库来解析数据（它支持分页，分页显示在要剪贴的页面上）。

Python3：如何从URL获取标题eng？

1 个答案: