Question

我有一个数据框df：

     A    B
0   28  abc
1   29  def
2   30  hij
3   31  hij
4   32  abc
5   28  abc
6   28  abc
7   29  def
8   30  hij
9   28  abc
10  29  klm
11  30  nop
12  28  abc
13  29  xyz

df.dtypes

A    object        # A is a string column as well
B    object
dtype: object

我想将此列表中的值用于groupby：

i = np.array([ 3,  5,  6,  9, 12, 14])

基本上，df中索引为0,1,2的所有行都在第一组中，索引为3,4的行在第二组中，索引为5的行在第三组中，因此上。

我的最终目标是：

A              B
28,29,30       abc,def,hij
31,32          hij,abc
28             abc
28,29,30       abc,def,hij
28,29,30       abc,klm,nop
28,29          abc,xyz

到目前为止使用groupby + pd.cut：

的解决方案

df.groupby(pd.cut(df.index, bins=np.append([0], i)), as_index=False).agg(','.join)

          A            B
0  29,30,31  def,hij,hij
1     32,28      abc,abc
2        28          abc
3  29,30,28  def,hij,abc
4  29,30,28  klm,nop,abc
5        29          xyz

结果不正确： - （

我该如何正确地做到这一点？

Answer 1

您非常接近，但在include_lowest=True中使用right=False和pd.cut，因为您希望来自垃圾箱的0索引然后您不想要包括每个箱子的最后一个元素，即

idx = pd.cut(df.index, bins=np.append([0], i), 
                      include_lowest=True, right=False)
df.groupby(idx, as_index=False).agg(','.join)

A              B
28,29,30       abc,def,hij
31,32          hij,abc
28             abc
28,29,30       abc,def,hij
28,29,30       abc,klm,nop
28,29          abc,xyz

Answer 2

我认为这可能很快..

var fs = require('fs');
var testFolder = './lib/';
var files = ['hello.txt', 'goodbye.txt'];
var contents = '';
//creating array of promises
let promises = files.map(async e => {
    return await fs.readFile(testFolder + e, "utf8", function (err, content) {
            contents += content + ".\n";
        });
 }
);
    console.log(promises);
    //this should happen last but "contents" is still empty string?
    Promise.all(promises).then(()=> console.log(contents));

自定义pandas groupby在间隔列表中

2 个答案: