Question

我正在使用大数据，我想提取一个子集。在SQL表示中，这是我想要实现的。我想用pandas / numpy这样做。

#1.  unionX1 = data[data.cpty_type == 'INTERBRANCH']
#2.  unionX1 = data[data.settlementDate >= '2017-04-18 00:00:00.000']

这两个陈述就他们自己的工作而言：

unionX1 = data[data.cpty_type == 'INTERBRANCH' & (data.settlementDate >= '2017-04-18' | data.settlementDate == '2017-04-18')]

我的版本（两者结合不起作用）：

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 877, in na_op
    result = op(x, y)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 127, in <lambda>
    ror_=bool_method(lambda x, y: operator.or_(y, x),
TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 895, in na_op
        result = lib.scalar_binop(x, y, op)
      File "pandas\lib.pyx", line 912, in pandas.lib.scalar_binop (pandas\lib.c:16177)
    ValueError: cannot include dtype 'M' in a buffer

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "C:/Users/Karunyan/PycharmProjects/RECON/criteria/distinct_matched_trades.py", line 18, in <module>
        unionX1 = data[data.cpty_type == 'INTERBRANCH' & (data.settlementDate >= '2017-04-18' | data.settlementDate == '1899-12-30')]
      File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 929, in wrapper
        na_op(self.values, other),
      File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py", line 899, in na_op
        x.dtype, type(y).__name__))
    TypeError: cannot compare a dtyped [datetime64[ns]] array with a scalar of type [bool]

运行时出现以下异常： 我认为这是因为比特逐渐比较 我在这里做错了什么建议？

X = c("Anna","Manuel","Laura","Jeanne") # Name of the Person
A = c(12,18,22,10)     # Age in years
B = c(112,186,165,120) # Size in cm

Answer 1

在Python中，|，&和^等按位操作的优先级高于<，>，{{1}等比较操作您需要在表达式中使用括号来强制执行正确的评估顺序。

例如，如果您编写==，它将被评估为A < B & C < D，这将在Pandas系列的情况下产生错误。您需要明确地编写A < (B & C) < D以使其按预期工作。

在您的情况下，您可以这样做：

(A < B) & (C < D)

Answer 2

由于运算符优先级，您需要在括号中包含多个条件，并使用按位和（&）和或（|）运算符：

unionX1 = data[(data.cpty_type == 'INTERBRANCH') & 
               ((data.settlementDate >='2017-04-18') | (data.settlementDate =='2017-04-18'))]

Answer 3

如果您想回避运算符优先级奇怪，可以使用numpy的{{1}}和logical_or函数。

logical_and

分组是显式的，因此您可以获得您想要的行为而无需记住二元运算符的优先级。

使用多个条件时，Pandas按位比较会引发异常

3 个答案: