修剪列命名正在生成ValueError

时间:2015-10-12 22:09:24

标签: pandas

我有一个表,我通过一个函数将其列调整为128(我知道它真的很长,我没办法做任何事情)字符所以它可以使用to_sql从中创建数据库它

def truncate_column_names(df, length):
    rename = {}
    for col in df.columns:
        if len(col) > length:
            new_col = col[:length-3]+"..."
            rename[col] = new_col
    result = df.rename(columns=rename)
    return result

这个功能运行正常,我得到一个表格就好了,但是当我试图保存文件时出现问题我得到了错误

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

我在保存到文件之前做了一些内务处理的方法包括删除重复项,这就是这个错误被吐出的地方。我通过保存原始dataFrame然后只是加载它,运行截断函数,然后在结果上尝试drop_duplicates来测试这个,我得到了同样的错误。

我尝试截断之前文件的标题如下所示:

http://pastebin.com/WXmvwHDg

我将文件修剪为1条记录仍然存在问题。

1 个答案:

答案 0 :(得分:0)

这是截断的结果,导致某些列具有非唯一名称。

为了确认这是一个问题,我做了一个简短的测试:

In [113]: df = pd.DataFrame(columns=["ab", "ac", "ad"])

In [114]: df
Out[114]:
Empty DataFrame
Columns: [ab, ac, ad]
Index: []

In [115]: df.drop_duplicates()
Out[115]:
Empty DataFrame
Columns: [ab, ac, ad]
Index: []

In [116]: df.columns
Out[116]: Index([u'ab', u'ac', u'ad'], dtype='object')

In [117]: df.columns = df.columns.str[:1]

In [118]: df
Out[118]:
Empty DataFrame
Columns: [a, a, a]
Index: []

In [119]: df.drop_duplicates()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-119-daf275b6788b> in <module>()
----> 1 df.drop_duplicates()

C:\Miniconda\lib\site-packages\pandas\util\decorators.pyc in wrapper(*args, **kw
args)
     86                 else:
     87                     kwargs[new_arg_name] = new_arg_value
---> 88             return func(*args, **kwargs)
     89         return wrapper
     90     return _deprecate_kwarg

C:\Miniconda\lib\site-packages\pandas\core\frame.pyc in drop_duplicates(self, su
bset, take_last, inplace)
   2826         deduplicated : DataFrame
   2827         """
-> 2828         duplicated = self.duplicated(subset, take_last=take_last)
   2829
   2830         if inplace:

C:\Miniconda\lib\site-packages\pandas\util\decorators.pyc in wrapper(*args, **kw
args)
     86                 else:
     87                     kwargs[new_arg_name] = new_arg_value
---> 88             return func(*args, **kwargs)
     89         return wrapper
     90     return _deprecate_kwarg

C:\Miniconda\lib\site-packages\pandas\core\frame.pyc in duplicated(self, subset,
 take_last)
   2871
   2872         vals = (self[col].values for col in subset)
-> 2873         labels, shape = map(list, zip( * map(f, vals)))
   2874
   2875         ids = get_group_index(labels, shape, sort=False, xnull=False)

C:\Miniconda\lib\site-packages\pandas\core\frame.pyc in f(vals)
   2860
   2861         def f(vals):
-> 2862             labels, shape = factorize(vals, size_hint=min(len(self), _SI
ZE_HINT_LIMIT))
   2863             return labels.astype('i8',copy=False), len(shape)
   2864

C:\Miniconda\lib\site-packages\pandas\core\algorithms.pyc in factorize(values, s
ort, order, na_sentinel, size_hint)
    133     table = hash_klass(size_hint or len(vals))
    134     uniques = vec_klass()
--> 135     labels = table.get_labels(vals, uniques, 0, na_sentinel)
    136
    137     labels = com._ensure_platform_int(labels)

pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_labels (pandas\ha
shtable.c:13946)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

并得到了相同的结果。在截断后使用df.columns.unique()我在截断后有~200个重复的列