当表是JSON对象时,如何使用BeautifulSoup导入CSS表?

时间:2018-05-08 23:54:42

标签: python beautifulsoup

我正在尝试使用BS4从CSS重量导入表格。这是我的代码:

import csv
from bs4 import BeautifulSoup
import urllib.request as ur

outfile = open(r"table_data.csv","w+",newline='')
writer = csv.writer(outfile)

html = ur.urlopen('url')

tree = BeautifulSoup(html,"lxml")

table_tag = tree.select(playersData)[0]
tab_data = [[item.text for item in row_data.select("th,td")]
                for row_data in table_tag.select("tr")]

for data in tab_data:
    writer.writerow(data)
    print(' '.join(data))

tree变量的相关部分如下所示:

<div class="block">
<div class="block-content">
<div class="players" data-countinpage="10" data-pagination="true" id="league-players"></div>
<script>
        var playersData = JSON.parse('\x5B\x7B\x22id\x22\x3A\x221250\x22,\x22player_name\x22\x3A\x22Mohamed\x20Salah\x22,\x22games\x22\x3A\x2235\x22,\x22time\x22\x3A\x222869\x227D');
</script> </div>
</div>

如何将playersData导入csv?

1 个答案:

答案 0 :(得分:0)

将表导出到csv的一种方法是:

import csv
from bs4 import BeautifulSoup
# import urllib.request as ur
import re
import json
from collections import OrderedDict

# Open output file for writing
outfile = open(r"table_data.csv","w")

# HTML string
# html = ur.urlopen('url')
html = """
<div class="block">
<div class="block-content">
<div class="players" data-countinpage="10" data-pagination="true" id="league-players"></div>
<script>
        var playersData = JSON.parse('\x5B\x7B\x22id\x22\x3A\x221250\x22,\x22player_name\x22\x3A\x22Mohamed\x20Salah\x22,\x22games\x22\x3A\x2235\x22,\x22time\x22\x3A\x222869\x22\x7D\x5D');
</script> </div>
</div>
"""

# Extract json string
tree = BeautifulSoup(html,"html.parser")
data = tree.find_all("script")[0].string
pattern = re.compile('var playersData = JSON.parse\(\'(.*)\'\);')
match = pattern.search(data)
json_obj = json.loads(match.group(1), object_pairs_hook=OrderedDict)

# Write header
keys = '"' + '","'.join(json_obj[0].keys()) + '"' + "\n"
outfile.write(keys)

# Write data  
for element in json_obj:
    values = '"' + '","'.join(element.values()) + '"' + "\n"
    outfile.write(values)

此代码从脚本标记中提取数据并解析json字符串。它首先将标头输出到csv文件中。然后它将每个记录输出到csv文件的单独行中。

运行程序时,csv文件将具有以下输出:

"id","player_name","games","time"
"1250","Mohamed Salah","35","2869"

注意:上面的代码是使用python版本2.7.10测试的。