Question

我有以下代码，可以从网站上获取一些带汉字的数据。

import csv
import requests
from bs4 import BeautifulSoup

url = "http://www.hkcpast.net/cpast_homepage/xyzbforms/BetMatchDetails.asp?tBetDate=2016/9/11"

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

for a in soup.find_all('html'):
    a.decompose()

list = []
for row in soup.find_all('tr'):
    cols = row.find_all('td')
    for col in cols:
        if len(col) > 0:
            list.append(col.text.encode('utf-8').strip())

目前结果如下：

[1, x, y, z, 2, x, y, z, 3, x, y, z]

我的问题是我想从列表中创建一些子列表，它们用数字分隔（1,2,3,4,5 ...）

以便结果如下：

[1, x, y, z]
[2, x, y, z]
[3, x, y, z]

这样做的最终目标是将每个子列表编写为csv文件中的一行。首先将列表分成每个条目然后写入csv文件是否有意义？

Answer 1

您的代码的字面翻译如下：

list = []
for row in soup.find_all('tr'):
    cols = row.find_all('td')
    for col in cols:
        if len(col) = 0:
            continue  # Save some indentation
        txt = col.text.encode('utf-8').strip()
        try:
           _ = int(txt)
           # txt is an int.  Append new sub-list
           list.append( [txt] )
        except ValueError:
           # txt is not an int, append it to the end of previous sub-list
           list[-1].append(txt)

（请注意，如果第一个条目不是int，这将会非常失败！）

但是，我怀疑你确实想为表格中的每一行创建一个新的子列表。

列表理解和具有中文字符的项目

1 个答案: