Question

我有一张表，有3列A，B，C - 其中A不是主键。我们需要为每个不同的A（A组分组）选择B，C对，并将结果附加到最终结果集的末尾。这可能在sql中吗？

A | B | C
a1| b1| c1
a1| b2| c2
a1| b3| c3
a2| b1| c2
a2| b2| c5

我需要

a1 | (c1,b1) ; (c2,b2);(c3;b3) 
a2 | (c2,b1) ; (c5,b2)

作为末尾附加的行。我通常通过sqlalchemy这样做，然后最终在Python中转换数据，有没有一种方法可以直接在SQL中执行此操作？

编辑＆amp;未决问题： red shift（Postgres 8.0.2）中string_agg（）的替代方法是什么 - 有关上述用例的更多信息。

在使用string_agg时，我得到ERROR: function string_agg(text, "unknown") does not exist Hint: No function matches the given name and argument types. You may need to add explicit type casts

编辑2 ：使用自定义聚合函数添加错误

An error occurred when executing the SQL command:
CREATE FUNCTION cut_semicolon(text) RETURNS text AS $$
BEGIN
  RETURN SUBSTRING($1 FROM 4)

ERROR: unterminated dollar-quoted string at or near "$$
BEGIN
  RETURN SUBSTRING($1 FROM 4)"
  Position: 53

CREATE FUNCTION cut_semicolon(text) RETURNS text AS $$
                                                    ^

Execution time: 0.24s
(Statement 1 of 7 finished)

0 rows affected
END executed successfully

Execution time: 0.22s
(Statement 2 of 7 finished)

An error occurred when executing the SQL command:
$$ LANGUAGE 'plpgsql' IMMUTABLE

ERROR: unterminated dollar-quoted string at or near "$$ LANGUAGE 'plpgsql' IMMUTABLE"
  Position: 1

$$ LANGUAGE 'plpgsql' IMMUTABLE
^

Execution time: 0.22s
(Statement 3 of 7 finished)

An error occurred when executing the SQL command:
CREATE FUNCTION concat_semicolon(text, text) RETURNS text AS $$
BEGIN
  RETURN $1 || ' ; ' || $2

ERROR: unterminated dollar-quoted string at or near "$$
BEGIN
  RETURN $1 || ' ; ' || $2"
  Position: 62

CREATE FUNCTION concat_semicolon(text, text) RETURNS text AS $$
                                                             ^

Execution time: 0.22s
(Statement 4 of 7 finished)

0 rows affected
END executed successfully

Execution time: 0.22s
(Statement 5 of 7 finished)

An error occurred when executing the SQL command:
$$ LANGUAGE 'plpgsql' IMMUTABLE

ERROR: unterminated dollar-quoted string at or near "$$ LANGUAGE 'plpgsql' IMMUTABLE"
  Position: 1

$$ LANGUAGE 'plpgsql' IMMUTABLE
^

Execution time: 0.22s
(Statement 6 of 7 finished)

An error occurred when executing the SQL command:
CREATE AGGREGATE concat_semicolon(
  BASETYPE=text,
  SFUNC=concat_semicolon,
  STYPE=text,
  FINALFUNC=cut_semicolon,
  INITCOND=''
)

ERROR: SQL command "CREATE AGGREGATE concat_semicolon(
  BASETYPE=text,
  SFUNC=concat_semicolon,
  STYPE=text,
  FINALFUNC=cut_semicolon,
  INITCOND=''
)" not supported.

Execution time: 0.23s
(Statement 7 of 7 finished)


5 statements failed.
Script execution finished
Total script execution time: 1.55s

还查看了Google小组中的相关答案，＆amp; 看起来像替换分隔符“;”可能有帮助？ - 虽然我不确定，哪个;替换此函数定义。参考：https://groups.google.com/forum/#!topic/sql-workbench/5LHVUXTm3BI

编辑3： 也许，Redshift不支持创建函数本身？ “错误：不支持创建功能”2013年的一个帖子说so forums.aws.amazon.com/thread.jspa?threadID=121137

编辑4：

select A, concat(concat(concat(C, ',' ) , cast(B as varchar)), ',')
from  my_table
group by A,B,C


-- Is it ok to group by all A,B, C - since I can't group by A alone, which removes the related "C" columns-- 

gives -:
a1 c1b1b2b3
a2 c2b1b2

但不是C（和分号）的所有条目

a1 c1,b1;c2,b2;c2,b3
a2 c2,b1;c5,b2

但我希望介于两者之间的逗号。还需要知道A，B，C组是否合适？

Answer 1

的PostgreSQL

SELECT
  a,
  STRING_AGG('(' || c || ',' || b || ')', ' ; ')
FROM
  tbl
GROUP BY
  a;

修改：对于9.0之前的PostgreSQL版本（当引入STRING_AGG时），甚至在8.4之前（当添加ARRAY_AGG时），您可以创建自己的custom aggregate function。

编辑2 ：对于8.0之前的版本（可能是Amazon Redshift以某种方式基于PostgreSQL 7.4），不支持$$语法，因此函数体需要用引号括起来，引号里面身体需要逃脱。

CREATE FUNCTION cut_semicolon(text) RETURNS text AS ' BEGIN RETURN SUBSTRING($1 FROM 4); END; ' LANGUAGE 'plpgsql' IMMUTABLE; CREATE FUNCTION concat_semicolon(text, text) RETURNS text AS ' BEGIN RETURN $1 || '' ; '' || $2; END; ' LANGUAGE 'plpgsql' IMMUTABLE; CREATE AGGREGATE concat_semicolon( BASETYPE=text, SFUNC=concat_semicolon, STYPE=text, FINALFUNC=cut_semicolon, INITCOND='' );

然后使用该聚合。

SELECT a, CONCAT_SEMICOLON('(' || c || ',' || b || ')') FROM tbl GROUP BY a;

的MySQL

SELECT a, GROUP_CONCAT(CONCAT('(', c, ',', b, ')') SEPARATOR ' ; ') FROM tbl GROUP BY a;

Answer 2

除非你有一个非常具体的理由从数据库本身做这种事情，否则它应该在你的应用程序中完成。否则，您最终会得到一些集合，这些集合会返回您可能需要解析以进行后期处理的复杂文本字段等。

换句话说：

select A, B, C from table

然后，像（Ruby）：

res = {}
rows.each do |row|
  res[row['a']] ||= []
  res[row['a']][] = [row['b'], row['c']]
end

如果你坚持在Postgres中做这件事，你的选择并不多 - 如果有的话，在Redshift中。

array_agg()和string_agg()聚合函数都可能有用，但它们分别在8.4和9.0中引入，而Redshift显然不支持。

就我所知，Redshift doesn't support array constructors，所以使用可能工作的ARRAY((select ...))构造，也会飞出窗口。

返回使用ROW()构造的东西也是不可能的。即使它是，它也会像罪恶一样丑陋，而且在Python中操纵也是不可能的。

自定义聚合函数似乎是不可能的，如果另一个答案，以及它让你遵循的引导，是任何事情。这并不令人惊讶：文档似乎很清楚，你不能创建一个用户定义的函数，更不用说创建一个pl /语言了。

换句话说，就我所知，你唯一的选择是在你的应用程序中进行这种类型的聚合。顺便说一下，无论如何，这是你应该做这种事情的地方。

Answer 3

这可能在PostgreSQL中可以实现。特别是如果B和C属于同一类型。您可以使用ARRAY在第二列中生成两列结果和聚合数据，否则使用JSON。我不确定如何在MySQL中生成它，但可能你需要序列化为字符串，并在Python中反转它。

在我看来，无论哪种方式，正确的答案是：不要这样做。你将获得更少的可读性，hacky，不可移植的解决方案，这可能不一定是更快的解决方案。在Python中对数据进行一些后期处理以给它们最终形式没有任何错误，事实上它是一种常见的做法。特别是如果它纯粹是重新格式化输出而不是用于产生聚合结果。

Answer 4

试试这个获得

a1 | (c1,b1) ; (c2,b2);(c3;b3) 
a2 | (c2,b1) ; (c5,b2)

这是代码：

制作具有运行ID的临时表，这是SQL Server的示例，您可以尝试使用其他查询

Select identity(int, 1, 1) as ID, A, '('+C+';'+B+')' as aa
Into #table2
From #table
Order BY A, aa

使用循环

进行查询

Declare @sSql as Varchar(1000), @A as Varchar(2), @A2 as Varchar(2), @aa as Varchar(10)
Declare @iRec as int, @iL as int
Set @iRec  = (Select Count(*) From #table2)
Set @iL = 1
Set @sSql = ''

While @iL <= @iRec
Begin
    Set @A  = (Select A  From #table2 Where ID = @iL)
    Set @aa = (Select aa From #table2 Where ID = @iL)

    if @A = @A2
        Begin
            Set @sSql = Left(@sSql, Len(@sSql)-1)+';'+@aa+'`'
        End
    Else
        BEGIN
            Set @sSql = @sSql + ' Union Select `'+ @A+'`,`'+@aa+'`'
        END

    Set @A2 = @A
    Set @iL = @iL + 1
End
Set @sSql = Right(@sSql, Len(@sSql)-7)
Set @sSql = Replace(@sSql, '`', '''')
Exec(@sSql)

有用吗？

Answer 5

得出结论，它在postgres Redshift堆栈中无法解决。这就是我解决它的方式。

import pandas as pd
df =pd.DataFrame({'A':[1,1,1,2,2,3,3,3],'B':['aaa','bbb','cc','gg','aaa','bbb','cc','gg']})

def f(x):
    return [x['B'].values]

#s=df.groupby('A').apply(f)
series =df.groupby('A').apply(f)
series.name = 'metric'
s=pd.DataFrame(series.reset_index())
print s

   A            metric
0  1  [[aaa, bbb, cc]]
1  2       [[gg, aaa]]
2  3   [[bbb, cc, gg]]

将查询结果附加到PostgreSQL中的相同结果行 - Redshift

5 个答案:

的PostgreSQL

的MySQL