计算DataFrame

时间:2017-10-28 17:04:40

标签: python sql sqlite pandas dataframe

我想知道这个列表中有多少个不同的设备? 这对我的SQL语句是否足够,或者我必须为它做更多的事情。

不幸的是,我不知道有如此大量的数据,哪种方法是正确的,如果解决方案是正确的。

有些设备不止一次出现。也就是说,行号不是=设备数

欢迎使用Python或SQL作为建议

import pandas as pd
from sqlalchemy import create_engine # database connection
from IPython.display import display


disk_engine = create_engine('sqlite:///gender-train-devices.db')

phones = pd.read_sql_query('SELECT device_id, COUNT(device_id) FROM phone_brand_device_model GROUP BY [device_id]', disk_engine)

print phones

输出是:

    device_id  COUNT(device_id)
0      -9223321966609553846                 1
1      -9223067244542181226                 1
2      -9223042152723782980                 1
3      -9222956879900151005                 1
4      -9222896629442493034                 1
5      -9222894989445037972                 1
6      -9222894319703307262                 1
7      -9222754701995937853                 1
8      -9222661944218806987                 1
9      -9222399302879214035                 1
10     -9222352239947207574                 1
11     -9222173362545970626                 1
12     -9221825537663503111                 1
13     -9221768839350705746                 1
14     -9221767098072603291                 1
15     -9221674814957667064                 1
16     -9221639938103564513                 1
17     -9221554258551357785                 1
18     -9221307795397202665                 1
19     -9221086586254644858                 1
20     -9221079146476055829                 1
21     -9221066489596332354                 1
22     -9221046405740900422                 1
23     -9221026417907250887                 1
24     -9221015678978880842                 1
25     -9220961720447724253                 1
26     -9220830859283101130                 1
27     -9220733369151052329                 1
28     -9220727250496861488                 1
29     -9220452176650064280                 1
...                     ...               ...
186686  9219686542557325817                 1
186687  9219842210460037807                 1
186688  9219926280825642237                 1
186689  9219937375310355234                 1
186690  9219958455132520777                 1
186691  9220025918063413114                 1
186692  9220160557900894171                 1
186693  9220562120895859549                 1
186694  9220807070557263555                 1
186695  9220814716773471568                 1
186696  9220880169487906579                 1
186697  9220914901466458680                 1
186698  9221114774124234731                 1
186699  9221149157342105139                 1
186700  9221152396628736959                 1
186701  9221297143137682579                 1
186702  9221586026451102237                 1
186703  9221608286127666096                 1
186704  9221693095468078153                 1
186705  9221768426357971629                 1
186706  9221843411551060582                 1
186707  9222110179000857683                 1
186708  9222172248989688166                 1
186709  9222214407720961524                 1
186710  9222355582733155698                 1
186711  9222539910510672930                 1
186712  9222779211060772275                 1
186713  9222784289318287993                 1
186714  9222849349208140841                 1
186715  9223069070668353002                 1

[186716 rows x 2 columns]

3 个答案:

答案 0 :(得分:2)

如果您想要不同设备的数量,您只需查询数据库:

SELECT COUNT(distinct device_id)
FROM phone_brand_device_model ;

当然,如果您已将数据存储在数据框中以用于其他目的,则可以计算其中的行数。

答案 1 :(得分:1)

如果您已将内存中的数据作为数据帧,则可以使用:

df['device_id'].nunique()

否则使用戈登的解决方案 - 它应该更快

答案 2 :(得分:0)

如果你想在熊猫中做到这一点。你可以这样做:

len(phones.device_id.unique())