pandas中set_labels期间的新错误0.19.2:ValueError:标签长度不等

时间:2017-01-18 20:51:39

标签: pandas

从Pandas 0.18.1升级到0.19.2后,当我尝试向我的数据帧添加新级别和标签时出现以下错误。知道问题是什么吗?

gulp.task('default', ['dev', 'test', 'lint', 'watch', 'server']);

我想知道这是否是新熊猫代码中的错误。也许self.labels [0]应该是标签[0]?

print index

MultiIndex(levels=[[u'1', u'2'], [u'nextLevel']],
           labels=[[0, 1], [0, 0]],
           names=[u'segment..ASRinfo..supportedUtt', u'label'])

print levels

[['1', '2', 'Total'], ['nextLevel']]

print labels

[[0, 1, 2], [0, 0, 0]]

index = index.set_levels(levels)
print index

MultiIndex(levels=[[u'Supported', u'Unsupported', u'Total'], [u'nextLevel']],
           labels=[[0, 1], [0, 0]],
           names=[u'segment..ASRinfo..supportedUtt', u'label'])

    index = index.set_labels(labels)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-f6fb11fbbb3a> in <module>()
    288 
    289 # Initialize dfplot
--> 290 slice_data()
    291 
    292 if len(resultList)==1:

<ipython-input-11-f6fb11fbbb3a> in slice_data(*args)
     71     index = index.set_levels(levels)
     72     print index
---> 73     index = index.set_labels(labels)
     74     data_slice = data_slice.reindex(index)
     75 

/Users/user1/anaconda/lib/python2.7/site-packages/pandas/indexes/multi.pyc in set_labels(self, labels, level, inplace, verify_integrity)
    350             idx = self._shallow_copy()
    351         idx._reset_identity()
--> 352         idx._set_labels(labels, level=level, verify_integrity=verify_integrity)
    353         if not inplace:
    354             return idx

/Users/user1/anaconda/lib/python2.7/site-packages/pandas/indexes/multi.pyc in _set_labels(self, labels, level, copy, validate, verify_integrity)
    285 
    286         if verify_integrity:
--> 287             self._verify_integrity(labels=new_labels)
    288 
    289         self._labels = new_labels

/Users/user1/anaconda/lib/python2.7/site-packages/pandas/indexes/multi.pyc in _verify_integrity(self, labels, levels)
    145             if len(label) != label_length:
    146                 raise ValueError("Unequal label lengths: %s" %
--> 147                                  ([len(lab) for lab in labels]))
    148             if len(label) and label.max() >= len(level):
    149                 raise ValueError("On level %d, label max (%d) >= length of"

ValueError: Unequal label lengths: [3, 3]

3 个答案:

答案 0 :(得分:0)

我测试了我的修复程序并且它有效!我向Pandas提交了一个错误: https://github.com/pandas-dev/pandas/issues/15157

答案 1 :(得分:0)

我不确定它是否是一个错误 - 我想Pandas可以用缺失的值替换所有额外的索引,但我认为你应该使用reindex

df.reindex(index2)

index = pd.MultiIndex(levels=[[u'1', u'2'], [u'nextLevel']],
           labels=[[0, 1], [0, 0]],
           names=[u'segment..ASRinfo..supportedUtt', u'label'])

index2 = pd.MultiIndex(levels=[['1', '2', 'Total'], ['nextLevel']],
           labels=[[0, 1, 2], [0, 0, 0]],
           names=[u'segment..ASRinfo..supportedUtt', u'label'])

答案 2 :(得分:0)

我是Pandas的新手,我发现MultiIndexing的文档难以适应解决我自己的问题。基本上,我想添加一些额外的行。这是我提出的解决方案。可能有更好的方法来做到这一点。如果您愿意,请随意分享。

groupbyColumns = ['label0', 'label1']
data_slice = dataframe.groupby(by=groupbyColumns).sum()

index = data_slice.index

levels = list()
for levelIter in range(len(data_slice.index.levels)):
    levels.append([x for x in data_slice.index.levels[levelIter]])
levels[0].append('Total')
if len(resultList)==2:
    levels[-1].append('Difference')
    addIndexCountForDifferenceRow = 1
else:
    addIndexCountForDifferenceRow = 0

# Create new indexing sequence since we are adding Total (and Difference if doing comparison) rows
labels = list()
for labelIter in range(len(data_slice.index.labels)):
    labels.append(list())
if len(data_slice.index.labels)==2:
    labels0 = [x for x in data_slice.index.labels[0]]
    labels1 = [x for x in data_slice.index.labels[1]]
    for iter0 in range(max(labels0)+2):
        for iter1 in range(max(labels1)+1+addIndexCountForDifferenceRow):
            labels[0].append(iter0)
            labels[1].append(iter1)
if len(data_slice.index.labels)==3:
    labels0 = [x for x in data_slice.index.labels[0]]
    labels1 = [x for x in data_slice.index.labels[1]]
    labels2 = [x for x in data_slice.index.labels[2]]
    for iter0 in range(max(labels0)+2):
        for iter1 in range(max(labels1)+1):
            for iter2 in range(max(labels2)+1+addIndexCountForDifferenceRow):
                labels[0].append(iter0)
                labels[1].append(iter1)
                labels[2].append(iter2)

index = index.set_levels(levels)
index = index.set_labels(labels)
data_slice = data_slice.reindex(index)