下面是我正在处理的数据片段,有数千行和其他列。我必须更改'列Y'根据以下条件在第X栏'。
如果X列是" FIRST" :
细胞#1 =上皮细胞
细胞#2 =神经
如果X列是" SECOND":
细胞#1 =内皮细胞
细胞#2 =肌肉
数据帧:
Column X Column Y
FIRST cell#1
FIRST A
FIRST cell#2
FIRST C
SECOND N
SECOND V
SECOND cell#1
SECOND cell#2
代码:
for row in df['Column X']:
if row == "FIRST":
df.loc[(df['Column Y']== "cell#1"), 'Column Y'] = "epithelial"
df.loc[(df['Column Y']== "cell#2"), 'Column Y'] = "nerve"
elif row == "SECOND":
df.loc[(df['Column Y']== "cell#1"), 'Column Y'] = "endothelial"
df.loc[(df['Column Y']== "cell#2"), 'Column Y'] = "muscle"
else:
pass
上面的代码不起作用,行的条件==' FIRST'适用于整个数据框,并忽略行==' SECOND"的条件。请帮忙。
预期结果:
Column X Column Y
FIRST epithelial
FIRST A
FIRST nerve
FIRST C
SECOND N
SECOND V
SECOND endothelial
SECOND muscle
我上面代码的输出(不正确):
Column X Column Y
FIRST epithelial
FIRST A
FIRST nerve
FIRST C
SECOND N
SECOND V
SECOND epithelial
SECOND nerve
Y列中的最后两行应该是"内皮细胞"和"肌肉",不"上皮"和#34;神经"
答案 0 :(得分:1)
这是一种方式。请注意,不需要循环。为了方便和性能,许多pandas
操作都被矢量化。
import pandas as pd
df = pd.DataFrame([['FIRST', 'cell#1'], ['FIRST', 'A'],
['FIRST', 'cell#2'], ['FIRST', 'C'],
['SECOND', 'N'], ['SECOND', 'V'],
['SECOND', 'cell#1'], ['SECOND', 'cell#2']],
columns=['X', 'Y'])
df.loc[(df.X == 'FIRST') & (df.Y == 'cell#1'), 'Y'] = 'epithelial'
df.loc[(df.X == 'FIRST') & (df.Y == 'cell#2'), 'Y'] = 'nerve'
df.loc[(df.X == 'SECOND') & (df.Y == 'cell#1'), 'Y'] = 'endothelial'
df.loc[(df.X == 'SECOND') & (df.Y == 'cell#2'), 'Y'] = 'muscle'
# X Y
# 0 FIRST epithelial
# 1 FIRST A
# 2 FIRST nerve
# 3 FIRST C
# 4 SECOND N
# 5 SECOND V
# 6 SECOND endothelial
# 7 SECOND muscle
答案 1 :(得分:1)
我自己去了解grouping in Pandas我自己的更多信息,并感到惊讶的是我找不到一种优雅的方式来做这件事。
我想出的是:
import pandas as pd
df = pd.DataFrame({'Column X': ['FIRST', 'FIRST', 'FIRST', 'FIRST', 'SECOND', 'SECOND', 'SECOND', 'SECOND'], 'Column Y': ['cell#1', 'A', 'cell#2', 'C', 'N', 'V', 'cell#1', 'cell#2']})
def f(group):
y = group['Column Y']
key = group['Column X'].iloc[0]
if key == 'FIRST':
y[y == 'cell#1'] = 'epithelial'
y[y == 'cell#2'] = 'nerve'
elif key == 'SECOND':
y[y == 'cell#1'] = 'endothelial'
y[y == 'cell#2'] = 'muscle'
return group
df.groupby('Column X').apply(f)
但是这需要从分组列再次获取密钥,将其传递给f
会更简单。