Question

上下文

我正在处理一个DataFrame df，其中很多列都填充有数值

df
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2
150.0        |     3.14    |  ...  | 1.008

换句话说，我有list_cols列：

list_cols = ['lorem ipsum', 'dolor sic', ... ]  # arbitrary length, of course len(list_cols ) <= len(df.columns), and contains valid columns of my df

我想获得2个数据帧：

1，其中包含所有行，其中value < 0代表list_cols中至少一个（相当于OR）。我们称之为negative_values_matches
对应于其余数据帧的1，我们称其为positive_values_matches

预期结果示例

对于list_cols = ['lorem ipsum', 'dolor sic']，我将获得list_cols中至少为1的数据帧严格为负：

negative_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
0.5          |     -6.2    |  ...  | 79.8
-26.1        |     6200.0  |  ...  | -65.2


positive_values_matches
lorem ipsum  |  dolor sic  |  ...  |  (hundreds of cols)
---------------------------------------------------------
150.0        |     3.14    |  ...  | 1.008

我不想编写myslef这种代码：

negative_values_matches = df[ (criterion1 | criterion2 | ... | criterionn)]
positive_values_matches = df[~(criterion1 | criterion2 | ... | criterionn)]

（其中criterionk是列k的布尔值，例如：(df[col_k]>=0)，此处用括号表示，因为它是Pandas语法）

想法是采用程序化方法。我主要是在寻找布尔数组，因此可以使用布尔索引（请参见Pandas documentation）。

据我所知，这些帖子并非完全是我在说的：

Filtering DataFrame on multiple conditions in Pandas
Drop rows on multiple conditions in pandas dataframe
Pandas: np.where with multiple conditions on dataframes
Pandas DataFrame : How to select rows on multiple conditions?，这个离我正在寻找的东西有点近。但是，它依赖于生成可能不适用于“异国”列名（空格）的字符串（或者至少我不知道该怎么做）

我不知道如何与OR运算符abd一起将我的DataFrame上的布尔值评估全部链接在一起。

我该怎么办？

Answer 1

经过几次尝试，我设法实现了目标。

代码如下：

import Pandas
import numpy
# assume dataframe exists
df = ...
# initiliaze an array of False, matching df number of rows
resulting_bools = numpy.zeros((1, len(df.index)), dtype=bool)

for col in list_cols:
    # obtain array of booleans for given column and boolean condition for [row, column] value
    criterion = df[col].map(lambda x: x < 0) # same condition for each column, different conditions would have been more difficult (for me)

     # perform cumulative boolean evaluation accross columns
    resulting_bools |= criterion

# use the array of booleans to build the required df
negative_values_matches = df[ resulting_bools].copy() # use .copy() to avoid further possible warnings from Pandas depending on what you do with your data frame
positive_values_matches = df[~resulting_bools].copy()

这样，我成功获得了2个数据帧：

1，其中list_cols中至少1列的所有行的值均<0。
1和其他所有行（list_col中每个列的值> = 0）

（对False的数组初始化取决于布尔值的选择）

注意：该方法可以与multiple conditions on dataframes结合使用。待确认。

Pandas DataFrame：在多列条件下将数据框的程序化行拆分

上下文

预期结果示例

1 个答案: