Question

代码段1

import pandas as pd  
df = pd.read_csv("filename.txt", sep='\t', header = 0, names = ['E', 'S', 'D'])  
Result = df.query(df.E.head(**n=100**) == 0)

代码段1按预期工作，并返回dataframe，df.E值等于0。但是，

代码段2

import pandas as pd  
df = pd.read_csv("filename.txt", sep='\t', header = 0, names = ['E', 'S', 'D'])  
Result = df.query(df.E.head(**n=101**) == 0)

代码段2无法正常工作并抛出错误

"SyntaxError: ('invalid syntax', ('<unknown>', 1, 602, '[True ,True
,True ,True ,True ,True ,True ,True ,True ,True ,True ,True ,True
,True ,True ,True ,... ,True ,True ,True ,True ,True ,True ,True ,True
,True ,True ,True ,True ,True ,True ,True ,...]\n'))"

请注意，只有两组代码之间的更改为n=100和n=101。

删除.head(n=101)时仍然存在错误。尝试过多次大于100的值，会抛出相同的错误。

Answer 1

df.query接受字符串查询。你没有传递有效的python（它实际上接受了python的轻微超集）。所以我不希望你的任何一个代码片段工作，因此SyntaxError。

直接退出doc-string

Parameters
----------
expr : string
    The query string to evaluate.  You can refer to variables
    in the environment by prefixing them with an '@' character like
    ``@a + b``.


In [14]: pd.set_option('max_rows',10)

In [15]: np.random.seed(1234)

In [16]: df = DataFrame(np.random.randint(0,10,size=100).reshape(-1,1),columns=list('a'))

In [17]: df
Out[17]: 
    a
0   3
1   6
2   5
3   4
4   8
.. ..
95  9
96  2
97  9
98  1
99  3

[100 rows x 1 columns]

In [18]: df.query('a==3')
Out[18]: 
    a
0   3
21  3
26  3
28  3
30  3
32  3
51  3
60  3
99  3

In [19]: var = 3

In [20]: df.query('a==@var')
Out[20]: 
    a
0   3
21  3
26  3
28  3
30  3
32  3
51  3
60  3
99  3

当数据帧大小超过100行时，Pandas dataframe query（）会引发错误

1 个答案: