根据另一列的值替换 Pandas 数据框中的特定值

时间:2021-02-04 14:24:58

标签: python pandas dataframe

我有一个与此类似的 DataFrame:

Chr  Start_Position End_Position Type
1    10000          10001        SNP
5    45321          45327        INS
12   44700          44710        DEL

我需要根据 Type 是什么来更改某些单元格的值:

  • SNP 需要 Start_Position + 1
  • INS 需要 End_Position + 1
  • DEL 需要 Start_Position + 1

我的问题是我当前的解决方案非常冗长。我尝试过的(dataframe 是原始数据源):

snp_records = dataframe.loc[dataframe["Type"] == "SNP", :]
del_records = dataframe.loc[dataframe["Type"] == "DEL", :]
ins_records = dataframe.loc[dataframe["Type"] == "INS", :]

snp_records.loc[:, "Start_Position"] = snp_records["Start_Position"].add(1)
del_records.loc[:, "Start_Position"] = del_records["Start_Position"].add(1)
ins_records.loc[:, "End_Position"] = ins_records["End_Position"].add(1)

dataframe.loc[snp_records.index, "Start_Position"] = snp_records["Start_Position"]
dataframe.loc[del_records.index, "Start_Position"] = del_records["Start_Position"]
dataframe.loc[ins_records.index, "End_Position"] = ins_records["End_Position"]

由于我必须为比示例更多的列执行此操作(但类似的概念),这变得非常冗长且冗长,并且可能容易出错(实际上,我在输入示例时犯了几个错误),原因是所有重复的行。

This question is similar to mine,但那里的值是预定义的,而我需要自己从数据中获取它们。

3 个答案:

答案 0 :(得分:4)

你可以这样做:

df.loc[df['Type'].isin(['SNP','INS']), 'Start_Position'] += 1
df.loc[df['Type'].eq('INS'), 'End_Position'] += 1

答案 1 :(得分:3)

对于一般解决方案,您可以将列表传递给 Series.isin 并传递给 DataFrame.loc 以通过掩码设置值:

start = ['SNP','DEL']
end = ['INS']

df.loc[df['Type'].isin(start), 'Start_Position'] += 1
df.loc[df['Type'].isin(end), 'End_Position'] += 1
print (df)
   Chr  Start_Position  End_Position Type
0    1           10001         10001  SNP
1    5           45321         45328  INS
2   12           44701         44710  DEL

在一个 DataFrame.loc 中传递两列的另一个想法:

m = pd.concat([df['Type'].isin(start), df['Type'].isin(end)], axis=1)
df[[ 'Start_Position', 'End_Position']] += m.to_numpy()
print (df)
   Chr  Start_Position  End_Position Type
0    1           10001         10001  SNP
1    5           45321         45328  INS
2   12           44701         44710  DEL

或者:

m = np.vstack((df['Type'].isin(start), df['Type'].isin(end))).T
df[[ 'Start_Position', 'End_Position']] += m
print (df)
   Chr  Start_Position  End_Position Type
0    1           10001         10001  SNP
1    5           45321         45328  INS
2   12           44701         44710  DEL

答案 2 :(得分:2)

试试 np.where

start = ['SNP','DEL']
end = ['INS']

df['Start_Position'] = np.where(df['Type'].isin(start),df['Start_Position']+1,df['Start_Position'])

df['End_Position'] = np.where(df['Type'].isin(end ),df['End_Position']+1,df['End_Position'])