Pandas HDF5选择非自然命名列的Where

时间:2013-10-09 16:03:02

标签: pandas hdf5 pytables

在我继续狂热的异国大熊猫/ HDF5问题中,我遇到了以下情况:

我有一系列非自然命名的列(nb:因为一个很好的理由,负数是“系统”ID等),这通常不会产生问题:

fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'])

然而,我的select语句确实落在了它上面:

>>> fact_hdf.select('store_0_0', columns=['o', 'a-6', 'm-13'], where=[('a-6', '=', [0, 25, 28])])
blablabla
File "/srv/www/li/venv/local/lib/python2.7/site-packages/tables/table.py", line 1251, in _required_expr_vars
    raise NameError("name ``%s`` is not defined" % var)
NameError: name ``a`` is not defined

有什么方法可以解决这个问题吗?我可以将我的负值从“a-1”重命名为“a_1”,但这意味着重新加载我系统中的所有数据。这是相当多的! :)

建议非常欢迎!

1 个答案:

答案 0 :(得分:3)

这是一个测试表

In [1]: df = DataFrame({ 'a-6' : [1,2,3,np.nan] })

In [2]: df
Out[2]: 
   a-6
0    1
1    2
2    3
3  NaN

In [3]: df.to_hdf('test.h5','df',mode='w',table=True)

 In [5]: df.to_hdf('test.h5','df',mode='w',table=True,data_columns=True)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_kind'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'a-6_dtype'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

有一种方法,但会将其构建到代码本身。您可以对列名进行变量替换,如下所示。这是现有的例程(在master中)

   def select(self):
        """
        generate the selection
        """
        if self.condition is not None:
            return self.table.table.readWhere(self.condition.format(), start=self.start, stop=self.stop)
        elif self.coordinates is not None:
            return self.table.table.readCoordinates(self.coordinates)
        return self.table.table.read(start=self.start, stop=self.stop)

相反,如果你这样做

(Pdb) self.table.table.readWhere("(x>2.0)",
      condvars={ 'x' : getattr(self.table.table.cols,'a-6')})
array([(2, 3.0)], 
      dtype=[('index', '<i8'), ('a-6', '<f8')])

e.g。通过使用列引用替换x,您可以获取数据。

这可以在检测到无效列名时完成,但非常棘手。

不幸的是我会建议重命名你的列。