我正在使用Python访问巨大的Oracle Exadata数据库。该表的文档相当差,我需要了解一些奇怪的情况。来自R / python世界,我运行了以下查询:
query = ("""
SELECT COUNT(counter) as freq, counter
FROM (
SELECT COUNT(*) as counter
FROM schema.table
WHERE x = 1 AND y = 1
GROUP BY a,b )
GROUP BY counter""")
with cx_Oralce.connct(dsn=tsn, encoding = "UTF-8") as con:
df = pd.read_sql(con=con, query=sql)
这实质上是对给定(a,b)对的观察次数进行计数。我以前的观点是,它们全为1(不是)。因此,我想看看导致这种情况的观察结果:
query = ("""
SELECT *
FROM schema.table
WHERE x = 1 and y = 1
AND (for each (a,b) there is more than one record)""")
我正在努力将其转换为适当的Oracle SQL。
在R(dplyr
)中,这将是group_by
和mutate
(而不是summarise
)的组合,而在Python pandas
中,这可以做到与transform
。
我是SQL新手,可能使用了错误的术语。我很欣赏被纠正。
答案 0 :(得分:1)
您可以使用窗口功能:
SELECT ab.*
FROM (SELECT t.*, COUNT(*) OVER (PARTITION BY a, b) as cnt
FROM schema.table t
WHERE x = 1 AND y = 1
) ab
WHERE cnt > 1;