熊猫数据框条件替换和列修整

时间:2019-01-18 20:37:24

标签: python pandas conditional

Current Pandas DataFrame

fn1 = pd.DataFrame([['A', 'NaN', 'NaN', 9, 6], ['B', 'NaN', 2, 'NaN', 7], ['C', 3, 2, 'NaN', 10], ['D', 'NaN', 7, 'NaN', 'NaN'], ['E', 'NaN', 'NaN', 3, 3], ['F', 'NaN', 'NaN', 7,'NaN']], columns = ['Symbol', 'Condition1','Condition2', 'Condition3', 'Condition4'])

fn1.set_index('Symbol', inplace=True)



         Condition1 Condition2 Condition3 Condition4
Symbol                                            
A             NaN        NaN          9          6
B             NaN          2        NaN          7
C               3          2        NaN         10
D             NaN          7        NaN        NaN
E             NaN        NaN          3          3
F             NaN        NaN          7        NaN

我目前正在使用类似于上面链接的Pandas DataFrame。我正在尝试逐列替换与该行关联的'Symbol'而不是'NaN'的值,然后折叠每列(或写入新的DataFrame),以便每列都是该'Symbol'的列表出现在每个“条件”中,如期望的输出所示:

Desired Output

我已经能够将每种情况下出现的“符号”放入列表列表中(见下文),但希望保持相同的列名,并且将它们添加到不断增长的新DataFrame时遇到了麻烦,因为长度是可变的,我正在遍历各列。

ls2 = []
for col in fn1.columns:
    fn2 = fn1[fn1[col] > 0]
    ls2.append(list(fn2.index))

其中fn1是看起来像第一张图片的DataFrame,我已将“符号”列作为索引。

在此先感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

您可以将符号映射到每一列,然后获取一组非空值。

df = fn1.apply(lambda x: x.map(fn1['Symbol'].to_dict()))
condition_symbols =  {col:sorted(list(set(fn1_symbols[col].dropna()))) for col in fn1.columns[1:]}

这将为您提供字典:

{'Condition1': ['B', 'D'],
 'Condition2': ['C', 'H'],
 'Condition3': ['D', 'H', 'J'],
 'Condition4': ['D', 'G', 'H', 'K']}

我知道您要一个数据框,但是由于每个列表的长度不同,因此将其放入数据框是没有意义的。如果您想要一个Dataframe,则可以运行以下代码:

pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in condition_symbols.items() ]))

这将为您提供以下输出:

            Condition1  Condition2  Condition3  Condition4
0           B           C           D           D
1           D           H           H           G
2           NaN         NaN         J           H
3           NaN         NaN         NaN         K

答案 1 :(得分:0)

另一个答案将是切片,如下所示(注释中的解释):

import numpy as np
import pandas as pd

df = pd.DataFrame.from_dict({
    "Symbol": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k"],
    "Condition1": [1, np.nan, 3, np.nan, np.nan, np.nan, 7, np.nan, np.nan, 8, 12],
    "Condition2": [np.nan, 2, 2, 7, np.nan, np.nan, 5, 11, 14, np.nan, np.nan],
    }
)


new_df = pd.concat(
    [
        df["Symbol"][df[column].notnull()].reset_index(drop=True) # get columns without null and ignore the index (as your output suggests)
        for column in list(df)[1:] # Iterate over all columns except "Symbols"
    ],
    axis=1, # Column-wise concatenation
)
# Rename columns
new_df.columns = list(df)[1:]
# You can leave NaNs or replace them with empty string, your choice
new_df.fillna("", inplace=True)

此操作的输出将是:

  Condition1 Condition2
0          a          b
1          c          c
2          g          d
3          j          g
4          k          h
5                     i

如果您需要进一步的说明,请在下面发表评论。