我有一个由ID和序列号组成的DataFrame。我想创建一个新的DataFrame,将Ids作为索引,并将序列号作为列值,并在长度不相等的情况下使用零填充。
我的问题是,当我尝试按id分组时,我的groupby(“ id”)-object中的组数与nunique(“ id”)值的数目不匹配,这很直观。对于每个示例,我尝试使用较小的DateFrames进行匹配。有什么建议吗?
import pandas as pd
import numpy as np
# data example (real df is shape(188225, 2)
hu = pd.DataFrame({'Id': ['1','12','123','1234','12345'],
'Serial':['A','AB','ABC','ABC','ABC']},
dtype = 'category')
max_len = df.groupby('Id')['Serial'].size().max() # Find the max length
grouped = df.groupby('Id')
from io import StringIO
from csv import writer
output = StringIO()
csv_writer = writer(output)
for key, vals in grouped.groups.items():
# Vector of serials with 0 padding matching so max_len = | [a, b, c, 0, 0, 0...]|
csv_writer.writerow(np.append(np.append(key, vals.values), np.array([0] * (max_len - len(vals)))))
output.seek(0) #goes to the start of the IO file
dfdiscrete = pd.read_csv(output,
header=None,
index_col=0,
dtype=str)
print("\Discrete Serials:", len(grouped.groups), "nunique ids", hu['Id'].nunique())
我希望这两个是:
Shape discrete devices: (29840, 50) nunique citizen ids 29840
,
但实际输出是
Shape discrete devices: (56674, 50) nunique citizen ids 29840