使用pandas数据帧作为查找表

时间:2018-04-24 12:59:29

标签: python pandas dataframe

给定数据框X中的一行,从数据框Y检索与查询行完全匹配的所有行的最有效方法是什么?

示例:从

查询行[0,1,0,1]
[
 [0,1,0,1, 1.0],
 [0,1,0,1, 2.0],
 [0,1,0,0, 3.0],
 [1,1,0,0, 0.5],
]

应该返回

[
 [0,1,0,1, 1.0],
 [0,1,0,1, 2.0],
]
假设{p> XY具有相同的架构,但Y具有额外的目标值列。可能有一个,零个或多个匹配。即使有数千列,该解决方案也应该是高效的。

3 个答案:

答案 0 :(得分:2)

使用boolean indexing

  type QuestionType {
    id: String,
  }

  type LectureType {
    id: String,
    questions: [QuestionType],
  }

const getOne = async ({ args, context }) => {
  const { lecture, question } = constructIds(args.id);
  const oneLecture = await model.get({ type: process.env.lecture, id: lecture });
  oneLecture.questions = await model
  .query('type')
  .eq(process.env.question)
  .where('id')
  .beginsWith(lecture)
  .exec();
  return oneLecture;
};

<强>解释

首先按序列长度选择第一个L = [ [0,1,0,1, 1.0], [0,1,0,1, 2.0], [0,1,0,0, 3.0], [1,1,0,0, 0.5], ] df = pd.DataFrame(L) Y = [0,1,0,1] print (df[df.iloc[:, :len(Y)].eq(Y).all(axis=1)]) 0 1 2 3 4 0 0 1 0 1 1.0 1 0 1 0 1 2.0 列:

N

eqprint (df.iloc[:, :len(Y)]) 0 1 2 3 0 0 1 0 1 1 0 1 0 1 2 0 1 0 0 3 1 1 0 0 选择的第一行比较所有行:

loc

并检查DataFrame.all是否匹配,以检查每行print (df.iloc[:, :len(Y)].eq(Y)) 0 1 2 3 0 True True True True 1 True True True True 2 True True True False 3 False True True False

True

答案 1 :(得分:1)

我会选择merge

import pandas as pd

y = pd.DataFrame({'A': [1, 1, 3],
                  'B': list('aac'),
                  'C': list('ddf'),
                  'D': [4, 5, 6]})

x = pd.DataFrame([[1, 'a', 'd']],
                 columns=list('ABC'))

match = x.merge(y, on=x.columns.tolist())

match
#   A  B  C  D
#0  1  a  d  4
#1  1  a  d  5

答案 2 :(得分:1)

一种有效的方法是下拉到import pandas as pd, numpy as np df = pd.DataFrame({'A':list('abadef'), 'B':[4,5,4,5,5,4], 'C':[7,8,7,4,2,3], 'D':[1,3,1,7,1,0], 'E':[5,3,5,9,2,4], 'F':list('aaabbb')}) vals = df.values arr = [4, 7, 1, 5] mask = np.logical_and.reduce([vals[:, i+1]==arr[i] for i in range(len(arr))]) res = df.iloc[np.where(mask)[0]] print(res) # A B C D E F # 0 a 4 7 1 5 a # 2 a 4 7 1 5 a 并查询各个列:

来自@jezrael的数据。

        epi_week    state   loc_type    disease    cases    incidence
21835   200011      WY      STATE       MUMPS       2       0.40
21836   197501      WY      STATE       POLIO       3       0.76
21837   199607      WY      STATE       HEPATITIS   3       0.61
21838   197116      WY      STATE       MUMPS       6       1.73
21839   200048      WY      STATE       HEPATITIS   6       1.21