应用if语句来替换Y列中的数据'对应于'列X'中的不同行

时间:2018-02-07 18:14:10

标签: python pandas

下面是我正在处理的数据片段,有数千行和其他列。我必须更改'列Y'根据以下条件在第X栏'。

如果X列是" FIRST" :

细胞#1 =上皮细胞

细胞#2 =神经

如果X列是" SECOND":

细胞#1 =内皮细胞

细胞#2 =肌肉

数据帧:

 Column X            Column Y        
    FIRST               cell#1
    FIRST               A
    FIRST               cell#2
    FIRST               C
    SECOND              N
    SECOND              V
    SECOND              cell#1
    SECOND              cell#2

代码:

for row in df['Column X']:
    if row == "FIRST":
        df.loc[(df['Column Y']== "cell#1"), 'Column Y'] = "epithelial"
        df.loc[(df['Column Y']== "cell#2"), 'Column Y'] = "nerve"
    elif row == "SECOND":
        df.loc[(df['Column Y']== "cell#1"), 'Column Y'] = "endothelial"
        df.loc[(df['Column Y']== "cell#2"), 'Column Y'] = "muscle"
    else:
        pass

上面的代码不起作用,行的条件==' FIRST'适用于整个数据框,并忽略行==' SECOND"的条件。请帮忙。

预期结果:

Column X            Column Y        
    FIRST               epithelial
    FIRST               A
    FIRST               nerve
    FIRST               C
    SECOND              N
    SECOND              V
    SECOND              endothelial
    SECOND              muscle

我上面代码的输出(不正确):

Column X            Column Y        
    FIRST               epithelial
    FIRST               A
    FIRST               nerve
    FIRST               C
    SECOND              N
    SECOND              V
    SECOND              epithelial
    SECOND              nerve

Y列中的最后两行应该是"内皮细胞"和"肌肉",不"上皮"和#34;神经"

2 个答案:

答案 0 :(得分:1)

这是一种方式。请注意,不需要循环。为了方便和性能,许多pandas操作都被矢量化。

import pandas as pd

df = pd.DataFrame([['FIRST', 'cell#1'], ['FIRST', 'A'],
                   ['FIRST', 'cell#2'], ['FIRST', 'C'],
                   ['SECOND', 'N'], ['SECOND', 'V'],
                   ['SECOND', 'cell#1'], ['SECOND', 'cell#2']],
                  columns=['X', 'Y'])

df.loc[(df.X == 'FIRST') & (df.Y == 'cell#1'), 'Y'] = 'epithelial'
df.loc[(df.X == 'FIRST') & (df.Y == 'cell#2'), 'Y'] = 'nerve'
df.loc[(df.X == 'SECOND') & (df.Y == 'cell#1'), 'Y'] = 'endothelial'
df.loc[(df.X == 'SECOND') & (df.Y == 'cell#2'), 'Y'] = 'muscle'

#         X            Y
# 0   FIRST   epithelial
# 1   FIRST            A
# 2   FIRST        nerve
# 3   FIRST            C
# 4  SECOND            N
# 5  SECOND            V
# 6  SECOND  endothelial
# 7  SECOND       muscle

答案 1 :(得分:1)

我自己去了解grouping in Pandas我自己的更多信息,并感到惊讶的是我找不到一种优雅的方式来做这件事。

我想出的是:

import pandas as pd

df = pd.DataFrame({'Column X': ['FIRST', 'FIRST', 'FIRST', 'FIRST', 'SECOND', 'SECOND', 'SECOND', 'SECOND'], 'Column Y': ['cell#1', 'A', 'cell#2', 'C', 'N', 'V', 'cell#1', 'cell#2']})

def f(group):
    y = group['Column Y']
    key = group['Column X'].iloc[0]
    if key == 'FIRST':
        y[y == 'cell#1'] = 'epithelial'
        y[y == 'cell#2'] = 'nerve'
    elif key == 'SECOND':
        y[y == 'cell#1'] = 'endothelial'
        y[y == 'cell#2'] = 'muscle'
    return group

df.groupby('Column X').apply(f)

但是这需要从分组列再次获取密钥,将其传递给f会更简单。