评估每个单元格并返回列头,如果不是null pandas df

时间:2017-09-29 09:51:32

标签: python pandas iterator nested-loops

我有pandas.df 233行* 234列,我需要评估每个单元格并返回相应的列标题,如果不是nan,到目前为止我写了以下内容:

#First get a list of all column names (except column 0):

col_list=[]

for column in df.columns[1:]:
    col_list.append(column)

#Then I try to iterate through every cell and evaluate for Null
#Also a counter is initiated to take the next col_name from col_list
#when count reach 233

for index, row in df.iterrows():
    count = 0
    for x in row[1:]:
        count = count+1
        for col_name in col_list:
            if count >= 233: break
            elif str(x) != 'nan':
                print col_name 

代码并不完全如此,我需要更改什么才能让代码在233行之后中断并转到下一个col_name?

Example:

    Col_1   Col_2    Col_3
1    nan     13       nan
2    10      nan      nan
3    nan      2        5
4    nan     nan       4

output:      
1   Col_2
2   Col_1
3   Col_2
4   Col_3
5   Col_3

3 个答案:

答案 0 :(得分:5)

我认为如果第一列是index stack,则需要删除所有NaN,然后通过reset_indexMultiindex的第二级获取值并选择Series构造函数或Index.get_level_values

s = df.stack().reset_index()['level_1'].rename('a')
print (s)
0    Col_2
1    Col_1
2    Col_2
3    Col_3
4    Col_3
Name: a, dtype: object

或者:

s = pd.Series(df.stack().index.get_level_values(1))
print (s)
0    Col_2
1    Col_1
2    Col_2
3    Col_3
4    Col_3
dtype: object

如果需要输出list

L = df.stack().index.get_level_values(1).tolist()
print (L)
['Col_2', 'Col_1', 'Col_2', 'Col_3', 'Col_3']

<强>详细

print (df.stack())
1  Col_2    13.0
2  Col_1    10.0
3  Col_2     2.0
   Col_3     5.0
4  Col_3     4.0
dtype: float64

答案 1 :(得分:3)

我使用了jezrael的堆栈解决方案。

但是,如果您对Numpy方式感兴趣,通常会更快。

In [4889]: np.tile(df.columns, df.shape[0])[~np.isnan(df.values.ravel())]
Out[4889]: array(['Col_2', 'Col_1', 'Col_2', 'Col_3', 'Col_3'], dtype=object)

计时

In [4913]: df.shape
Out[4913]: (100, 3)

In [4914]: %timeit np.tile(df.columns, df.shape[0])[~np.isnan(df.values.ravel())]
10000 loops, best of 3: 35.8 µs per loop

In [4915]: %timeit df.stack().index.get_level_values(1)
1000 loops, best of 3: 335 µs per loop

In [4905]: df.shape
Out[4905]: (100000, 3)

In [4907]: %timeit np.tile(df.columns, df.shape[0])[~np.isnan(df.values.ravel())]
100 loops, best of 3: 5.98 ms per loop

In [4908]: %timeit df.stack().index.get_level_values(1)
100 loops, best of 3: 11.7 ms per loop

根据您的需要选择(可读性,速度,可维护性等)

答案 2 :(得分:1)

您可以使用dropna

df.dropna(axis=1).columns

轴:{0或'索引',1或'列'}

如何:{'any','all'}

基本上你使用dropna删除null,axis = 1正在删除列,以及=&#34; any&#34;要删除列中的至少一个是null,.columns获取剩余的标题。