HDFStore:选择列是否在数组

时间:2016-07-21 12:26:40

标签: python pandas hdfstore

我有一张表,其中包括以下列:

>>> hdf.select('foo').columns
Out[22]: 
Index(['bar', 'units'],
      dtype='object')

现在我想选择bar具有以下两个值之一的那些:

myBar = ['1500013010', '1500002071']
hdf.select('foo', 'bar in [{}]'.format(', '.join(myBar)))

但我得到了这个例外,我暗示我无法使用" bar"作为变量。

  

所有变量引用必须是对引用的引用                   轴(例如'索引'或'列')或data_column                   当前定义的引用是:index,columns

但它不是一个专栏吗?

Traceback (most recent call last):
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4593, in generate
    return Expr(where, queryables=q, encoding=self.table.encoding)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/pytables.py", line 516, in __init__
    self.terms = self.parse()
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 726, in parse
    return self._visitor.visit(self.expr)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
    return visitor(node, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 316, in visit_Module
    return self.visit(expr, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
    return visitor(node, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 319, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
    return visitor(node, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 627, in visit_Compare
    return self.visit(binop)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
    return visitor(node, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 400, in visit_BinOp
    op, op_class, left, right = self._possibly_transform_eq_ne(node)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 351, in _possibly_transform_eq_ne
    left = self.visit(node.left, side='left')
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
    return visitor(node, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 413, in visit_Name
    return self.term_type(node.id, self.env, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/pytables.py", line 38, in __init__
    super(Term, self).__init__(name, env, side=side, encoding=encoding)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/ops.py", line 57, in __init__
    self._value = self._resolve_name()
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/pytables.py", line 44, in _resolve_name
    raise NameError('name {0!r} is not defined'.format(self.name))
NameError: name 'bar' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-21-75c9827e34f0>", line 1, in <module>
    hdf.select('foo', 'bar in [{}]'.format(', '.join(bar)))
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 680, in select
    return it.get_result()
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 1364, in get_result
    results = self.func(self.start, self.stop, where)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 673, in func
    columns=columns, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4021, in read
    if not self.read_axes(where=where, **kwargs):
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 3222, in read_axes
    self.selection = Selection(self, where=where, **kwargs)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4580, in __init__
    self.terms = self.generate(where)
  File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4605, in generate
    .format(where, ','.join(q.keys()))
ValueError: The passed where expression: bar in [1500013010, 1500002071]
            contains an invalid variable reference
            all of the variable refrences must be a reference to
            an axis (e.g. 'index' or 'columns'), or a data_column
            The currently defined references are: index,columns

1 个答案:

答案 0 :(得分:3)

您的列未编入索引,因此无法搜索,因此您无法在where参数中使用它们。

演示:

In [131]: df = pd.DataFrame(np.random.randint(0,20,size=(5, 3)), columns=list('ABC'))

In [132]: df
Out[132]:
    A   B   C
0  19   4  18
1   4  14  16
2  17  13   9
3  19   9  13
4  16   8  10

In [133]: fn = 'C:/temp/test.h5'

In [134]: store = pd.HDFStore(fn)

In [135]: store.append('df', df)

In [136]: store.select('df', 'B > 10')
---------------------------------------------------------------------------
...
NameError: name 'B' is not defined

During handling of the above exception, another exception occurred:
...
ValueError: The passed where expression: B > 10
            contains an invalid variable reference
            all of the variable refrences must be a reference to
            an axis (e.g. 'index' or 'columns'), or a data_column
            The currently defined references are: index,columns

现在让我们尝试使用索引列:

In [137]: store.append('df_indexed', df, data_columns=True)

In [139]: store.select('df_indexed', 'B > 10')
Out[139]:
    A   B   C
1   4  14  16
2  17  13   9

如何检查列是否已编入索引:

In [154]: store.get_storer('df_indexed').table.colindexes
Out[154]:
{
    "C": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "B": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "A": Index(6, medium, shuffle, zlib(1)).is_csi=False}

In [155]: store.get_storer('df').table.colindexes
Out[155]:
{
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False}