如果有人能帮助我,我会很高兴!我正在尝试为cashpoint.dk创建一个webscraper,它将为给定的URL获取足球赔率。 在我的任务中,我试图将解析后的数据提取到json,我也在考虑使用sqlite3数据库,尽管如此,使用我的json提取的输出实际上是在“窃听”我!
如何将我的json代码格式化为显示此格式的格式?
{
"bettext": "Hvem vinder kampen?"
"team1": "Rusland"
"team2": "Saudi Arabien"
"tip": "1"
"odds:" "138"
"tip": "3"
"odds: "460"
"tip": "2"
"odds: "926"
}
这是表达此内容的原始格式:
- Russia vs. Saudia Arabia,
- Who will win?,
- 1 (Russia) at odds 1,38,
- 3 (Draw) at odds 4,60,
- 2 (Saudi Arabia) at odds 9,26
{
"bettext": "Hvem vinder kampen?",
"odds": "138",
"team1": "Rusland",
"team2": "Saudi Arabien",
"tip": "1"
}
{
"bettext": "Hvem vinder kampen?",
"odds": "138",
"team1": "Rusland",
"team2": "Saudi Arabien",
"tip": "1"
}
{
"bettext": "Hvem vinder kampen?",
"odds": "460",
"team1": "Rusland",
"team2": "Saudi Arabien",
"tip": "3"
}
{
"bettext": "Hvem vinder kampen?",
"odds": "926",
"team1": "Rusland",
"team2": "Saudi Arabien",
"tip": "2"
}
我的问题还在于我在dict中有完全重复的对象。 下面的代码是我用来运行它的代码。
import demjson
import json
import itertools, json
import re
from bs4 import BeautifulSoup
import requests
url = "https://www.cashpoint.dk/en/?r=bets/xtra&group=461392&game=312004790"
print(url)
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
class Scraper():
def __init__(self):
self.tables = soup.select('table.sportbet_extra_list_table')
for table in self.tables:
self.fields = table.select('.sportbet_extra_rate_content')
for field in self.fields:
self.js_obj = re.search('{.+}', field['onclick']).group()
self.bet = demjson.decode(self.js_obj)
# print(self.bet)
# print((self.bet['team1'], self.bet['team2'], self.bet['bettext'], self.bet['tiptext'], self.bet['tip']))
prettyjson = {
'tip': str(self.bet['tip']),
'team1': str(self.bet['team1']),
'team2': str(self.bet['team2']),
'bettext': str(self.bet['bettext']),
'odds': str(self.bet['odd']),
}
dumpit = json.dumps(prettyjson, ensure_ascii=True, sort_keys=True, indent=10, separators=(',', ': '))
print(dumpit)
with open('result.json', 'a') as outfile:
for sprettyjson in self.bet:
json.dump(prettyjson, outfile, ensure_ascii=True, sort_keys=True, indent=10, separators=(',', ': '))
outfile.write('\n')
答案 0 :(得分:0)
请参阅我的评论,以帮助澄清您的要求。
我的理解是,您正在尝试将多个JSON对象减少为单个对象结构,以消除不必要数据的重复。
首先要记住的是,JSON对象在每个范围级别只能有一个标记实例。
这不行:
{
"tag":"value",
"tag":"value"
}
没关系:
{
"tag":"value",
"subtag":{
"tag:"value"
}
}
在您的情况下,您的“子标签”应该是tips
个对象的数组,允许您根据需要重复赔率和提示标记。
尝试重新编写代码以生成以下内容:
{
"bettext": "Hvem vinder kampen?",
"team1": "Rusland",
"team2": "Saudi Arabien",
"tips":[{"tip": "1",
"odds:" "138"},
{"tip": "3",
"odds: "460"},
{"tip": "2",
"odds: "926"}]
}