Question

我正在尝试根据字典和特定列对pandas DataFrame进行采样。因此，对于import 'package:flutter/material.dart'; import 'package:hello_world/distance_matrix.dart'; void main() async { runApp(new MyApp( distanceMatrix: await DistanceMatrix.loadData(), )); } class MyApp extends StatefulWidget { final DistanceMatrix distanceMatrix; @override _MyAppState createState() => new _MyAppState(); MyApp({this.distanceMatrix}); } class _MyAppState extends State<MyApp> { @override Widget build(BuildContext context) { return MaterialApp( home: Scaffold( appBar: AppBar( title: Text("Home"), ), body: Material( child: ListView.builder( itemCount: widget.distanceMatrix.elements.length, itemBuilder: (context, index){ return ListTile( title: Text(widget.distanceMatrix.elements[index].distance.text), subtitle: Text(widget.distanceMatrix.elements[index].distance.value.toString()), ); }, ) ))); } }列的每个值，我确切知道我想选择多少个观测值。

我可以通过y groupby这样的组合来做到这一点：

apply

y y x z

0 2 0 1 2 4 0 1 2 1 5 1 1 2 2 0 2 1 2

但是，如果我使用import pandas as pd df = pd.DataFrame({'y': [2,2,0,0,0,1,1,1,1,1], 'x': 1, 'z': 2}) y x z 0 2 1 2 1 2 1 2 2 0 1 2 3 0 1 2 4 0 1 2 5 1 1 2 sizes = {0: 2, 1: 1, 2:1} df.groupby('y').apply(lambda x: x.sample(sizes[x['y'].values[0]]))而不是unique（应该等效），则在数据帧上会收到奇怪的values错误：

KeyError: 'y'

有人可以解释为什么会这样吗？

编辑：

这发生在df.groupby('y').apply(lambda x: x.sample(sizes[x.y.unique()[0]]))上，但没有发生在0.23.1上，因此这可能是一个错误。

Answer 1

我认为您需要.name属性：

df1 = df.groupby('y').apply(lambda x: x.sample(sizes[x.name]))
print (df1)

     y  x  z
y           
0 4  0  1  2
  2  0  1  2
1 6  1  1  2
2 0  2  1  2

如果可能，字典中的某些值不匹配，请对不匹配的值使用get和0：

df1 = df.groupby('y').apply(lambda x: x.sample(sizes.get(x.name, 0)))

编辑：

问题是unique返回一个元素numpy数组：

def f(x):
    print (x['y'].unique())
    print (x['y'].unique()[0])
    print (sizes[x['y'].unique()[0]])
    print (x.sample(sizes[x['y'].unique()[0]]))

df1 = df.groupby('y').apply(f)

[0]
0
2
   y  x  z
2  0  1  2
4  0  1  2
[0]
0
2
   y  x  z
4  0  1  2
2  0  1  2
[1]
1
1
   y  x  z
6  1  1  2
[2]
2
1
   y  x  z
0  2  1  2

df1 = df.groupby('y').apply(lambda x: x.sample(sizes[x.y.unique()[0]]))
print (df1)
     y  x  z
y           
0 4  0  1  2
  2  0  1  2
1 6  1  1  2
2 0  2  1  2

基于字典的示例熊猫

1 个答案: