SQL / Pandas等价

时间:2016-09-07 15:41:52

标签: python sql sql-server pandas dataframe

这个SQL查询的Pandas等价物是什么:

select  column1,
         sum(column2) as A,
         count(distinct column3) as B,
         sum(column2) / count(distinct column3) as C
from     table1
group by column1

感谢您提供任何帮助!!

1 个答案:

答案 0 :(得分:0)

我不确定sum(column2) / count(distinct column3) as C部分可以在同一个步骤中完成,但您可以通过两个步骤轻松完成:

演示:

In [47]: df = pd.DataFrame(np.random.randint(0,5,size=(15, 3)), columns=['c1','c2','c3'])
In [48]: df
Out[48]:
    c1  c2  c3
0    4   0   3
1    2   3   2
2    1   2   3
3    3   3   0
4    1   0   4
5    1   1   1
6    2   3   3
7    2   2   2
8    4   0   0
9    1   1   0
10   1   3   0
11   4   3   1
12   0   0   3
13   3   1   0
14   4   3   1

In [49]: x = df.groupby('c1').agg({'c2':'sum', 'c3': 'nunique'}).reset_index().rename(columns={'c2':'A', 'c3':'B'})

In [50]: x
Out[50]:
   c1  A  B
0   0  0  1
1   1  7  4
2   2  8  2
3   3  4  1
4   4  6  3

In [51]: x['C'] = x.A / x.B

In [52]: x
Out[52]:
   c1  A  B     C
0   0  0  1  0.00
1   1  7  4  1.75
2   2  8  2  4.00
3   3  4  1  4.00
4   4  6  3  2.00