Question

我有一个由ID和序列号组成的DataFrame。我想创建一个新的DataFrame，将Ids作为索引，并将序列号作为列值，并在长度不相等的情况下使用零填充。

我的问题是，当我尝试按id分组时，我的groupby（“ id”）-object中的组数与nunique（“ id”）值的数目不匹配，这很直观。对于每个示例，我尝试使用较小的DateFrames进行匹配。有什么建议吗？

import pandas as pd
import numpy as np

# data example (real df is shape(188225, 2)
hu = pd.DataFrame({'Id': ['1','12','123','1234','12345'], 
                   'Serial':['A','AB','ABC','ABC','ABC']},
                    dtype = 'category') 

max_len = df.groupby('Id')['Serial'].size().max() # Find the max length 

grouped = df.groupby('Id') 

from io import StringIO
from csv import writer

output = StringIO()
csv_writer = writer(output)

for key, vals in grouped.groups.items():
    # Vector of serials with 0 padding matching so max_len = | [a, b, c, 0, 0, 0...]|
    csv_writer.writerow(np.append(np.append(key, vals.values), np.array([0] * (max_len - len(vals)))))

    output.seek(0) #goes to the start of the IO file
    dfdiscrete = pd.read_csv(output,
                             header=None,
                             index_col=0,
                             dtype=str)

print("\Discrete Serials:", len(grouped.groups), "nunique ids", hu['Id'].nunique())

我希望这两个是：
Shape discrete devices: (29840, 50) nunique citizen ids 29840，
但实际输出是
Shape discrete devices: (56674, 50) nunique citizen ids 29840

groupby对象中的ngroups与同一列中的nunique（）不匹配

0 个答案: