我想对sql查询有所帮助,我正在使用SQLAlchemy,但我什至不了解如何用原始sql表示查询。
我将一个季节中的所有视频的每一帧都刷新,然后将其添加到数据库中。 我的目标是找到介绍这些视频的简介,以检查视频中是否有相同的准确帧。
我的桌子如下:
|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42
---+------+-----+------+-------+-------
|2 |1337 |a1a1a|1 |1 |68
---+------+-----+------+-------+-------
|3 |1337 |a1a1b|1 |2 |92
---+------+-----+------+-------+-------
|4 |1337 |a1a1a|1 |2 |116
---+------+-----+------+-------+-------
|5 |1337 |a1a1a|1 |3 |42
---+------+-----+------+-------+-------
|6 |1337 |a1a1a|1 |3 |42
我正在寻找a的结果是一个行列表,其中哈希值匹配n个情节(当时只能匹配情节),并且具有相同的tvdbid和季节编号。
此刻我正在做
import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Hashes(Base):
__tablename__ = 'hashes'
id = sa.Column(sa.Integer, primary_key=True)
season = sa.Column(sa.Integer)
episode = sa.Column(sa.Integer)
tvdbid = sa.Column(sa.Text(length=100))
hash = sa.Column(sa.Text(length=16))
offset = sa.Column(sa.Integer)
h = Hashes.__table__
async def some_web_request(request):
# I need to use raw sql or core as the db library requires it.
# my cli tool uses a sync method to insert the rows in the db.
query = h.select().where(sa.and_(h.c.tvdbid ==
request.path_params['tvdbid'],
h.c.season == request.path_params['season'])).group_by('hash', 'episode')
result = await DB.fetch_all(query)
return result
这似乎很好用,但是它并不是我想要的,因此我必须使用python进行清理,从长远来看它是不可行的。该表将具有5-5亿行。
我当前的“解决方法”:
from collections import defaultdict
def clean_up(result):
d = defaultdict(set)
for row in result:
d[row.hash].add(row.episode)
final_result = []
for k, v in d.items():
if (l) > 4: # 4 is the number of episodes.
final_result.append(k)
return final_result
所需的输出应该是:
|id|tvdbid|hash |season|episode|offset
---+------+-----+------+-------+------
|1 |1337 |a1a1a|1 |1 |42
,因为散列需要至少出现50%的情节。 或者它可能只是a1a1a我现在真的不需要整行了。 (这需要花些时间检查摘要)。