Question

我有一个数据框，其中包含教育列和教育编号列。我想知道教育中具有X值的每一行是否对应于教育编号中的Y值。

在某种程度上，我已经能够做到这一点，能够分辨出有多少行与该条件匹配。

def Educ_to_num(educ, educ_num, name, num):
  result = educ.str.contains(name) & (educ_num == num)
  matches = result[result == True].sum()
  print(str(matches))
  result_b = other['education'].str.contains(name)
  rows_name = result_b[result_b == True].sum()
  print(str(rows_name))
  if matches == rows_name:
    return name + ' equals ' + str(num);
  else:
    return name + ' does not equal ' + str(num);

我会这样调用函数：

Educ_to_num(other['education'].dropna(), other['education-num'].dropna(), 'Masters', 14.0)

我还尝试获取不符合条件的数字：

mom = other['education'].str.contains('HS-grad')[other['education-num'] != 9.0]
mom[mom == True].sum()

但是我不知道如何显示“教育”为“ HS-grad”的行，但“教育数量”与预期值9.0不对应。我想显示这些行以查看数据中存在的问题。任何帮助表示赞赏。

Answer 1

要按多列过滤，可以执行-

other[(other['education'] == 'Masters') & (other['education-num'] == 14)].dropna()

对于第二种情况，代码应为-

mom = other[(other['education'] == 'HS-grad') & (other['education-num'] != 9)].dropna()

如何从满足特定条件的数据框中获取行

1 个答案: