我想根据其他2列中的值填充1到16的数字列。我可以从提供列标题开始或创建新列(对我而言无关紧要)。
我试图创建一个对数字1-10进行迭代的函数,然后根据b和y的值将值分配给z变量。 然后,我想将此功能应用于数据框中的每一行。
将熊猫作为pd导入
将numpy导入为np
data = pd.read_csv('Nuc.csv')
def write_Pcolumns(df):
"""populates a column in the given dataframe, df, based on the values in two other columns in the same dataframe"""
#create string of numbers for each nucleotide position
positions = ('1','2','3','4','5','6','7','8','9','10')
a = "Po "
x = "O.Po "
#for each position create a variable for the nucleotide in the sequence (Po) and opposite to the sequence(o. Po)
for each in positions:
b = a + each
y = x + each
z = 'P' + each
#assign a value to z based on the nucleotide identities in the sequence and opposite position
if df[b] == 'A' and df[y]=='A':
df[z]==1
elif df[b] == 'A' and df[y]=='C':
df[z]==2
elif df[b] == 'A' and df[y]=='G':
df[z]==3
elif df[b] == 'A' and df[y]=='T':
df[z]==4
...
elif df[b] == 'T' and df[y]=='G':
df[z]==15
else:
df[z]==16
return(df)
data.apply(write_Pcolumns(data),轴= 1)
我收到以下错误消息: 系列的真实值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。
答案 0 :(得分:0)
之所以会这样,是因为df[index]=='value'
返回了一系列布尔值,而不是每个值都返回一个布尔值。
签出Pandas error when using if-else to create new column: The truth value of a Series is ambiguous