我想知道这个列表中有多少个不同的设备? 这对我的SQL语句是否足够,或者我必须为它做更多的事情。
不幸的是,我不知道有如此大量的数据,哪种方法是正确的,如果解决方案是正确的。
有些设备不止一次出现。也就是说,行号不是=设备数
欢迎使用Python或SQL作为建议
import pandas as pd
from sqlalchemy import create_engine # database connection
from IPython.display import display
disk_engine = create_engine('sqlite:///gender-train-devices.db')
phones = pd.read_sql_query('SELECT device_id, COUNT(device_id) FROM phone_brand_device_model GROUP BY [device_id]', disk_engine)
print phones
输出是:
device_id COUNT(device_id)
0 -9223321966609553846 1
1 -9223067244542181226 1
2 -9223042152723782980 1
3 -9222956879900151005 1
4 -9222896629442493034 1
5 -9222894989445037972 1
6 -9222894319703307262 1
7 -9222754701995937853 1
8 -9222661944218806987 1
9 -9222399302879214035 1
10 -9222352239947207574 1
11 -9222173362545970626 1
12 -9221825537663503111 1
13 -9221768839350705746 1
14 -9221767098072603291 1
15 -9221674814957667064 1
16 -9221639938103564513 1
17 -9221554258551357785 1
18 -9221307795397202665 1
19 -9221086586254644858 1
20 -9221079146476055829 1
21 -9221066489596332354 1
22 -9221046405740900422 1
23 -9221026417907250887 1
24 -9221015678978880842 1
25 -9220961720447724253 1
26 -9220830859283101130 1
27 -9220733369151052329 1
28 -9220727250496861488 1
29 -9220452176650064280 1
... ... ...
186686 9219686542557325817 1
186687 9219842210460037807 1
186688 9219926280825642237 1
186689 9219937375310355234 1
186690 9219958455132520777 1
186691 9220025918063413114 1
186692 9220160557900894171 1
186693 9220562120895859549 1
186694 9220807070557263555 1
186695 9220814716773471568 1
186696 9220880169487906579 1
186697 9220914901466458680 1
186698 9221114774124234731 1
186699 9221149157342105139 1
186700 9221152396628736959 1
186701 9221297143137682579 1
186702 9221586026451102237 1
186703 9221608286127666096 1
186704 9221693095468078153 1
186705 9221768426357971629 1
186706 9221843411551060582 1
186707 9222110179000857683 1
186708 9222172248989688166 1
186709 9222214407720961524 1
186710 9222355582733155698 1
186711 9222539910510672930 1
186712 9222779211060772275 1
186713 9222784289318287993 1
186714 9222849349208140841 1
186715 9223069070668353002 1
[186716 rows x 2 columns]
答案 0 :(得分:2)
如果您想要不同设备的数量,您只需查询数据库:
SELECT COUNT(distinct device_id)
FROM phone_brand_device_model ;
当然,如果您已将数据存储在数据框中以用于其他目的,则可以计算其中的行数。
答案 1 :(得分:1)
如果您已将内存中的数据作为数据帧,则可以使用:
df['device_id'].nunique()
否则使用戈登的解决方案 - 它应该更快
答案 2 :(得分:0)
如果你想在熊猫中做到这一点。你可以这样做:
len(phones.device_id.unique())