我有一个pandas.DataFrame
,我正在迭代这些行。在每一行上,我需要过滤掉一些非有价值的值并保持索引关联。这就是我现在所处的位置:
for i,row in df.iterrows():
my_values = row["first_interesting_column":]
# here I need to filter 'my_values' Series based on a function
# what I'm doin right now is use the built-in python filter function, but what I get back is a list with no indexes anymore
my_valuable_values = filter(lambda x: x != "-", my_values)
我该怎么做?
答案 0 :(得分:1)
我被IRC的一个人建议了答案。这是:
w = my_values != "-" # creates a Series with a map of the stuff to be included/exluded
my_valuable_values = my_values[w]
......也可以缩短......
my_valuable_values = my_values[my_values != "-"]
......当然,为了避免再迈一步......
row["first_interesting_column":][row["first_interesting_column":] != "-"]
答案 1 :(得分:0)
迭代行通常是不好的做法(而且很慢)。正如@JohnE建议您使用applymap。
如果我理解你的问题,我认为你想要做的是:
import pandas as pd
from io import StringIO
datastring = StringIO("""\
2009 2010 2011 2012
1 4 - 4
3 - 2 3
4 - 8 7
""")
df = pd.read_table(datastring, sep='\s\s+')
a = df[df.applymap(lambda x: x != '-')].astype(np.float).values
a[~np.isnan(a)]