我有一张表,其中包括以下列:
>>> hdf.select('foo').columns
Out[22]:
Index(['bar', 'units'],
dtype='object')
现在我想选择bar
具有以下两个值之一的那些:
myBar = ['1500013010', '1500002071']
hdf.select('foo', 'bar in [{}]'.format(', '.join(myBar)))
但我得到了这个例外,我暗示我无法使用" bar"作为变量。
所有变量引用必须是对引用的引用 轴(例如'索引'或'列')或data_column 当前定义的引用是:index,columns
但它不是一个专栏吗?
Traceback (most recent call last):
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4593, in generate
return Expr(where, queryables=q, encoding=self.table.encoding)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/pytables.py", line 516, in __init__
self.terms = self.parse()
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 726, in parse
return self._visitor.visit(self.expr)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
return visitor(node, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 316, in visit_Module
return self.visit(expr, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
return visitor(node, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 319, in visit_Expr
return self.visit(node.value, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
return visitor(node, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 627, in visit_Compare
return self.visit(binop)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
return visitor(node, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 400, in visit_BinOp
op, op_class, left, right = self._possibly_transform_eq_ne(node)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 351, in _possibly_transform_eq_ne
left = self.visit(node.left, side='left')
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 310, in visit
return visitor(node, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/expr.py", line 413, in visit_Name
return self.term_type(node.id, self.env, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/pytables.py", line 38, in __init__
super(Term, self).__init__(name, env, side=side, encoding=encoding)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/ops.py", line 57, in __init__
self._value = self._resolve_name()
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/computation/pytables.py", line 44, in _resolve_name
raise NameError('name {0!r} is not defined'.format(self.name))
NameError: name 'bar' is not defined
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-21-75c9827e34f0>", line 1, in <module>
hdf.select('foo', 'bar in [{}]'.format(', '.join(bar)))
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 680, in select
return it.get_result()
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 1364, in get_result
results = self.func(self.start, self.stop, where)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 673, in func
columns=columns, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4021, in read
if not self.read_axes(where=where, **kwargs):
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 3222, in read_axes
self.selection = Selection(self, where=where, **kwargs)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4580, in __init__
self.terms = self.generate(where)
File "/asdf/anaconda/envs/myenv3/lib/python3.5/site-packages/pandas/io/pytables.py", line 4605, in generate
.format(where, ','.join(q.keys()))
ValueError: The passed where expression: bar in [1500013010, 1500002071]
contains an invalid variable reference
all of the variable refrences must be a reference to
an axis (e.g. 'index' or 'columns'), or a data_column
The currently defined references are: index,columns
答案 0 :(得分:3)
您的列未编入索引,因此无法搜索,因此您无法在where
参数中使用它们。
演示:
In [131]: df = pd.DataFrame(np.random.randint(0,20,size=(5, 3)), columns=list('ABC'))
In [132]: df
Out[132]:
A B C
0 19 4 18
1 4 14 16
2 17 13 9
3 19 9 13
4 16 8 10
In [133]: fn = 'C:/temp/test.h5'
In [134]: store = pd.HDFStore(fn)
In [135]: store.append('df', df)
In [136]: store.select('df', 'B > 10')
---------------------------------------------------------------------------
...
NameError: name 'B' is not defined
During handling of the above exception, another exception occurred:
...
ValueError: The passed where expression: B > 10
contains an invalid variable reference
all of the variable refrences must be a reference to
an axis (e.g. 'index' or 'columns'), or a data_column
The currently defined references are: index,columns
现在让我们尝试使用索引列:
In [137]: store.append('df_indexed', df, data_columns=True)
In [139]: store.select('df_indexed', 'B > 10')
Out[139]:
A B C
1 4 14 16
2 17 13 9
如何检查列是否已编入索引:
In [154]: store.get_storer('df_indexed').table.colindexes
Out[154]:
{
"C": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"B": Index(6, medium, shuffle, zlib(1)).is_csi=False,
"A": Index(6, medium, shuffle, zlib(1)).is_csi=False}
In [155]: store.get_storer('df').table.colindexes
Out[155]:
{
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False}