groupby对象中的ngroups与同一列中的nunique()不匹配

时间:2019-04-24 16:26:32

标签: python pandas

我有一个由ID和序列号组成的DataFrame。我想创建一个新的DataFrame,将Ids作为索引,并将序列号作为列值,并在长度不相等的情况下使用零填充。

我的问题是,当我尝试按id分组时,我的groupby(“ id”)-object中的组数与nunique(“ id”)值的数目不匹配,这很直观。对于每个示例,我尝试使用较小的DateFrames进行匹配。有什么建议吗?

import pandas as pd
import numpy as np

# data example (real df is shape(188225, 2)
hu = pd.DataFrame({'Id': ['1','12','123','1234','12345'], 
                   'Serial':['A','AB','ABC','ABC','ABC']},
                    dtype = 'category') 

max_len = df.groupby('Id')['Serial'].size().max() # Find the max length 

grouped = df.groupby('Id') 

from io import StringIO
from csv import writer

output = StringIO()
csv_writer = writer(output)

for key, vals in grouped.groups.items():
    # Vector of serials with 0 padding matching so max_len = | [a, b, c, 0, 0, 0...]|
    csv_writer.writerow(np.append(np.append(key, vals.values), np.array([0] * (max_len - len(vals)))))

    output.seek(0) #goes to the start of the IO file
    dfdiscrete = pd.read_csv(output,
                             header=None,
                             index_col=0,
                             dtype=str)

print("\Discrete Serials:", len(grouped.groups), "nunique ids", hu['Id'].nunique())

我希望这两个是:
Shape discrete devices: (29840, 50) nunique citizen ids 29840
但实际输出是
Shape discrete devices: (56674, 50) nunique citizen ids 29840

0 个答案:

没有答案