在数组中有效地找到DataFrame值的索引

时间:2018-12-18 15:06:51

标签: python pandas numpy dataframe

我有一个类似的DataFrame

x     y     z
--------------
0     A     10
0     D     13
1     X     20
...

,对于xy的每个可能值,我都有两个排序的数组:

x_values = [0, 1, ...]
y_values = ['a', ..., 'A', ..., 'D', ..., 'X', ...]

所以我写了一个函数:

def lookup(record, lookup_list, lookup_attr):
    return np.searchsorted(lookup_list, getattr(record, lookup_attr))

然后致电:

df_x_indicies = df.apply(lambda r: lookup(r, x_values, 'x')
df_y_indicies = df.apply(lambda r: lookup(r, y_values, 'y')

# df_x_indicies: [0, 0, 1, ...]
# df_y_indicies: [26, ...]

但是还有更多高效的方法可以做到这一点吗?并可能一次获得多个列以获得返回的DataFrame而不是一个序列?

我尝试过:

np.where(np.in1d(x_values, df.x))[0]

但这会删除重复的值,这是不希望的。

2 个答案:

答案 0 :(得分:4)

您可以将索引数组转换为pd.Index个对象,以加快查找速度。

u, v = map(pd.Index, [x_values, y_values])
pd.DataFrame({'x': u.get_indexer(df.x), 'y': v.get_indexer(df.y)})

   x  y
0  0  1
1  0  2
2  1  3

在哪里

x_values
# [0, 1]

y_values
# ['a', 'A', 'D', 'X']

对于需要针对多列进行这项工作的要求,您将必须遍历每一列。这是上面代码的一个版本,应该概括为N个列和索引。

val_list = [x_values, y_values] # [x_values, y_values, z_values, ...]
idx_list = map(pd.Index, val_list)
pd.DataFrame({
    f'{c}': idx.get_indexer(df[c]) for idx, c in zip(idx_list, df)})

   x  y
0  0  1
1  0  2
2  1  3

答案 1 :(得分:2)

使用import * as Fingerprint2 from 'fingerprintjs2';package main import ( "github.com/gotk3/gotk3/cairo" "github.com/gotk3/gotk3/gdk" "github.com/gotk3/gotk3/gtk" "log" ) var alphaSupported = false func main() { gtk.Init(nil) win, err := gtk.WindowNew(gtk.WINDOW_TOPLEVEL) if err != nil { log.Fatal("Unable to create window:", err) } win.SetTitle("Simple Example") win.Connect("destroy", func() { gtk.MainQuit() }) // Needed for transparency win.SetAppPaintable(true) win.Connect("screen-changed", func (widget *gtk.Widget, oldScreen *gdk.Screen, userData ...interface{}) { screenChanged(widget) }) win.Connect("draw", func (window *gtk.Window, context *cairo.Context) { exposeDraw(window, context) }) l, err := gtk.LabelNew("I'm transparent !") if err != nil { log.Fatal("Unable to create label:", err) } win.Add(l) win.SetDefaultSize(800, 600) screenChanged(&win.Widget) win.ShowAll() gtk.Main() } func screenChanged(widget *gtk.Widget) { screen, _ := widget.GetScreen() visual, _ := screen.GetRGBAVisual() if visual != nil { alphaSupported = true } else { println("Alpha not supported") alphaSupported = false } widget.SetVisual(visual) } func exposeDraw(w *gtk.Window, ctx *cairo.Context) { if alphaSupported { ctx.SetSourceRGBA(0.0, 0.0, 0.0, 0.25) } else { ctx.SetSourceRGB(0.0, 0.0, 0.0) } ctx.SetOperator(cairo.OPERATOR_SOURCE) ctx.Paint() } 进行更新,您也可以尝试Series

.loc