Question

我有一个我想在Jupyter中处理的数据框。此数据框最初填充了NaN，其中找到了空白，但我决定将其替换为“Null＆＃39;字符串（因为我在忽略NaN时遇到了问题）。

以下代码是原始文件mydata.txt

的示例

##IGNORE THIS LINE
group2,"BLA","BLE","BLI","BLO","BLU","TAT","TET","TOT","TUT"
group0,"BLA","BLE","BLI","BLO","BLU"
group3,"BLA","BLE","BLI"

我们的想法是构建数组，其中所有元素都不是NaN（或更晚，＆＃39; Null＆＃39;），我可以提供这些元素以过滤其他地方。

import rpy2.ipython
import rpy2.robjects as robjects
import pandas as pd
import numpy
import re #python for regex
%load_ext rpy2.ipython
%R

path='C:/MyPath/'

allgroups=pd.read_csv(path+'mydata.txt',sep=",",skiprows=1,header=None,index_col=0)
allgroups=allgroups.fillna("Null")

def groupdat(groupname):
    #Cleans group
    precleaned=numpy.array(allgroups.loc[[groupname]])
#     matching = [s for s in precleaned if s != "Null" ] #I tried this
    matching=filter(lambda elem: elem != "Null",precleaned) #I also tried this.
    print(matching)
    return

groupdat('group0')

以上评论的matching都会产生错误：ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()。

precleaned的输出是

[['BLA' 'BLE' 'BLI' 'BLO' 'BLU' 'Null' 'Null' 'Null' 'Null']]

打印allgroups.loc[[groupname]]给出

          1     2     3     4     5     6     7     8     9 
0                                                                  
group0    BLA   BLE   BLI   BLO   BLU   Null  Null  Null  Null

[1 rows x 9 columns]

我感谢所有反馈。

Answer 1

创建数组时，您有一个维度太多

numpy.array(allgroups.loc[["group0"]])

所以listcomp / filter遍历唯一的元素，这是一个数组，因此你得到的消息

像这样创建：

numpy.array(allgroups.loc[["group0"][0]])

然后[s for s in precleaned if s != "Null" ]产生：

['BLA', 'BLE', 'BLI', 'BLO', 'BLU']

正如所料。

在Python中检索与数组中的条件匹配的所有元素

1 个答案: