比较熊猫中的列值

时间:2021-04-09 19:55:16

标签: python pandas compare

我有一个代码,其中包含与个人性别相关的两列。我想看看是否有任何差异以及找到缺失值。我不熟悉如何做到这一点。由于我还是新手,您可以提供的任何帮助或资源将不胜感激。如果可能,我还想创建第三列来说明结果。

d = {'First': ['Male', 'Female', '', 'Female', 'Male', ''], 
      'Second': ['Male', 'Male', '', 'Female', '', 'Male']
df = pd.DataFrame(data = d)

#possible output column
df['Third'] = ['match', 'discrepancy', 'missing', 'match', 'discrepancy', 'discrepancy']

3 个答案:

答案 0 :(得分:1)

您可以使用 apply on axis=1 将函数应用于每一行。

您可以调整此函数以根据您的条件返回您想要的任何值。

import pandas as pd

d = {'First': ['Male', 'Female', '', 'Female', 'Male', ''],
     'Second': ['Male', 'Male', '', 'Female', '', 'Male']}
df = pd.DataFrame(data=d)


def eval_row(r):
     if not r['First'] and not r['Second']:
          return 'missing'
     elif r['First'] == r['Second']:
          return 'match'
     else:
          return 'discrepancy'

df['Third'] = df.apply(eval_row, axis=1)
print(df)

产生:

    First  Second        Third
0    Male    Male        match
1  Female    Male  discrepancy
2                      missing
3  Female  Female        match
4    Male          discrepancy
5            Male  discrepancy

答案 1 :(得分:1)

定义一个函数,将其应用到每一行并将结果保存在“第三”列

df.col1.apply(lambda x: x[0])

enter image description here

def compare(row): first = row["First"] second = row["Second"] if len(first) == 0 and len(second) == 0: return "missing" if pd.isna(first) and pd.isna(second): return "missing" if first != second: return "discrepancy" return "match" df["Third"] = df.apply(compare, axis=1) df 按行应用 axis=1 函数。默认值为axis=0,它将每一列发送到函数。

答案 2 :(得分:1)

使用嵌套的numpy解决class Question(models.Model): question = models.CharField(max_length=256) class Survey(models.Model): name = models.CharField(max_length=256) questions = models.ManyToManyField(Questions, through='Questionnaire') class Questionnaire(models.Model): survey = models.ForeignKey(Survey, on_delete=models.CASCADE) question = models.ForeignKey(Question, on_delete=models.CASCADE) question_order = models.IntegerField()

np.where()

输出:

df['Third'] = np.where(df.First == df.Second, np.where(df.First.str.len() > 0, 'match', 'missing'), 'discrepancy')
print(df)