如何创建多个新列,并使用pandas / python根据其他2列中的值填充列?

时间:2019-08-21 14:56:11

标签: pandas iterator rows

我想根据其他2列中的值填充1到16的数字列。我可以从提供列标题开始或创建新列(对我而言无关紧要)。

我试图创建一个对数字1-10进行迭代的函数,然后根据b和y的值将值分配给z变量。 然后,我想将此功能应用于数据框中的每一行。

将熊猫作为pd导入

将numpy导入为np

data = pd.read_csv('Nuc.csv')

def write_Pcolumns(df):

    """populates a column in the given dataframe, df, based on the values in two other columns in the same dataframe"""

    #create string of numbers for each nucleotide position 
    positions = ('1','2','3','4','5','6','7','8','9','10')
    a = "Po "
    x = "O.Po "
    #for each position create a variable for the nucleotide in the sequence (Po) and opposite to the sequence(o. Po)
for each in positions: 
        b = a + each
        y = x + each
        z = 'P' + each
        #assign a value to z based on the nucleotide identities in the sequence and opposite position
        if df[b] == 'A' and df[y]=='A':
            df[z]==1
        elif df[b] == 'A' and df[y]=='C':
            df[z]==2
        elif df[b] == 'A' and df[y]=='G':
            df[z]==3
        elif df[b] == 'A' and df[y]=='T':
            df[z]==4
        ...
        elif df[b] == 'T' and df[y]=='G':
            df[z]==15
        else:
            df[z]==16
    return(df)

data.apply(write_Pcolumns(data),轴= 1)

我收到以下错误消息: 系列的真实值是不明确的。使用a.empty,a.bool(),a.item(),a.any()或a.all()。

1 个答案:

答案 0 :(得分:0)

之所以会这样,是因为df[index]=='value'返回了一系列布尔值,而不是每个值都返回一个布尔值。

签出Pandas error when using if-else to create new column: The truth value of a Series is ambiguous