我有一个我想在Jupyter中处理的数据框。此数据框最初填充了NaN
,其中找到了空白,但我决定将其替换为“Null'字符串(因为我在忽略NaN
时遇到了问题)。
以下代码是原始文件mydata.txt
##IGNORE THIS LINE
group2,"BLA","BLE","BLI","BLO","BLU","TAT","TET","TOT","TUT"
group0,"BLA","BLE","BLI","BLO","BLU"
group3,"BLA","BLE","BLI"
我们的想法是构建数组,其中所有元素都不是NaN
(或更晚,' Null'),我可以提供这些元素以过滤其他地方。
import rpy2.ipython
import rpy2.robjects as robjects
import pandas as pd
import numpy
import re #python for regex
%load_ext rpy2.ipython
%R
path='C:/MyPath/'
allgroups=pd.read_csv(path+'mydata.txt',sep=",",skiprows=1,header=None,index_col=0)
allgroups=allgroups.fillna("Null")
def groupdat(groupname):
#Cleans group
precleaned=numpy.array(allgroups.loc[[groupname]])
# matching = [s for s in precleaned if s != "Null" ] #I tried this
matching=filter(lambda elem: elem != "Null",precleaned) #I also tried this.
print(matching)
return
groupdat('group0')
以上评论的matching
都会产生错误:ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
。
precleaned
的输出是
[['BLA' 'BLE' 'BLI' 'BLO' 'BLU' 'Null' 'Null' 'Null' 'Null']]
打印allgroups.loc[[groupname]]
给出
1 2 3 4 5 6 7 8 9
0
group0 BLA BLE BLI BLO BLU Null Null Null Null
[1 rows x 9 columns]
我感谢所有反馈。
答案 0 :(得分:1)
创建数组时,您有一个维度太多
numpy.array(allgroups.loc[["group0"]])
所以listcomp / filter
遍历唯一的元素,这是一个数组,因此你得到的消息
像这样创建:
numpy.array(allgroups.loc[["group0"][0]])
然后[s for s in precleaned if s != "Null" ]
产生:
['BLA', 'BLE', 'BLI', 'BLO', 'BLU']
正如所料。