我有一个代码,其中包含与个人性别相关的两列。我想看看是否有任何差异以及找到缺失值。我不熟悉如何做到这一点。由于我还是新手,您可以提供的任何帮助或资源将不胜感激。如果可能,我还想创建第三列来说明结果。
d = {'First': ['Male', 'Female', '', 'Female', 'Male', ''],
'Second': ['Male', 'Male', '', 'Female', '', 'Male']
df = pd.DataFrame(data = d)
#possible output column
df['Third'] = ['match', 'discrepancy', 'missing', 'match', 'discrepancy', 'discrepancy']
答案 0 :(得分:1)
您可以使用 apply on axis=1 将函数应用于每一行。
您可以调整此函数以根据您的条件返回您想要的任何值。
import pandas as pd
d = {'First': ['Male', 'Female', '', 'Female', 'Male', ''],
'Second': ['Male', 'Male', '', 'Female', '', 'Male']}
df = pd.DataFrame(data=d)
def eval_row(r):
if not r['First'] and not r['Second']:
return 'missing'
elif r['First'] == r['Second']:
return 'match'
else:
return 'discrepancy'
df['Third'] = df.apply(eval_row, axis=1)
print(df)
产生:
First Second Third
0 Male Male match
1 Female Male discrepancy
2 missing
3 Female Female match
4 Male discrepancy
5 Male discrepancy
答案 1 :(得分:1)
定义一个函数,将其应用到每一行并将结果保存在“第三”列
df.col1.apply(lambda x: x[0])
def compare(row):
first = row["First"]
second = row["Second"]
if len(first) == 0 and len(second) == 0:
return "missing"
if pd.isna(first) and pd.isna(second):
return "missing"
if first != second:
return "discrepancy"
return "match"
df["Third"] = df.apply(compare, axis=1)
df
按行应用 axis=1
函数。默认值为axis=0,它将每一列发送到函数。
答案 2 :(得分:1)
使用嵌套的numpy解决class Question(models.Model):
question = models.CharField(max_length=256)
class Survey(models.Model):
name = models.CharField(max_length=256)
questions = models.ManyToManyField(Questions, through='Questionnaire')
class Questionnaire(models.Model):
survey = models.ForeignKey(Survey, on_delete=models.CASCADE)
question = models.ForeignKey(Question, on_delete=models.CASCADE)
question_order = models.IntegerField()
:
np.where()
输出:
df['Third'] = np.where(df.First == df.Second, np.where(df.First.str.len() > 0, 'match', 'missing'), 'discrepancy')
print(df)