Question

给出下表

我正在尝试在pandas中找到一个干净的解决方案来减去一个值，例如30，以结束以下结果。

我想知道大熊猫是否有解决方案来执行此操作，不需要循环遍历数据框中的所有行，这可以利用大熊猫的批量操作。

Answer 1

确定cumsum大于或等于30
屏蔽不是
将一行重新分配为cumsum less 30

c = df.vals.cumsum()
m = c.ge(30)
i = m.idxmax()
n = df.vals.where(m, 0)
n.loc[i] = c.loc[i] - 30
df.assign(vals=n)

   vals
0     0
1     0
2     0
3     5
4    20

同样的事情，但是numpy fied

v = df.vals.values
c = v.cumsum()
m = c >= 30
i = m.argmax()
n = np.where(m, v, 0)
n[i] = c[i] - 30
df.assign(vals=n)

   vals
0     0
1     0
2     0
3     5
4    20

计时

%%timeit 
v = df.vals.values
c = v.cumsum()
m = c >= 30
i = m.argmax()
n = np.where(m, v, 0)
n[i] = c[i] - 30
df.assign(vals=n)
10000 loops, best of 3: 168 µs per loop

%%timeit
c = df.vals.cumsum()
m = c.ge(30)
i = m.idxmax()
n = df.vals.where(m, 0)
n.loc[i] = c.loc[i] - 30
df.assign(vals=n)
1000 loops, best of 3: 853 µs per loop

Answer 2

这里有一个使用NumPy的四行代码 -

v = df.vals.values
a = v.cumsum()-30
idx = (a>0).argmax()+1
v[:idx] = a.clip(min=0)[:idx]

示例运行 -

In [274]: df     # Original df
Out[274]: 
   vals
0    20
1     3
2     2
3    10
4    20

In [275]: df.iloc[3,0] = 7  # Bringing in some variety

In [276]: df
Out[276]: 
   vals
0    20
1     3
2     2
3     7
4    20

In [277]: v = df.vals.values
     ...: a = v.cumsum()-30
     ...: idx = (a>0).argmax()+1
     ...: v[:idx] = a.clip(min=0)[:idx]
     ...: 

In [278]: df
Out[278]: 
   vals
0     0
1     0
2     0
3     2
4    20

Answer 3

String[] cats = {"childrens", "signed"};

// Combining the optional categories arrays
 BasicDBList theMegaArray = new BasicDBList();
 for (int i = 1; i <= 5; i++) {
      String identifier = "categories.category" + i;
      String cleanIdentifier = "$" + identifier;
      theMegaArray.add(new BasicDBObject("$ifNull", Arrays.asList(cleanIdentifier, Collections.EMPTY_LIST)));
 }
 BasicDBObject theData = new BasicDBObject("$setUnion", theMegaArray);

// Add equals filter - Compare the arrays and output boolean filter field
 BasicDBObject theFilter = new BasicDBObject("$eq", Arrays.asList(theData, cats));

// Add projections to keep the output fields
 BasicDBObject theProjections = new BasicDBObject();
 theProjections.put("filter", theFilter);
 theProjections.put("pid", 1);
 theProjections.put("categories", 1);

// Add $project stage
 BasicDBObject theProject = new BasicDBObject("$project", theProjections);

// Add $match stage to compare the boolean filter field to true to keep matching documents
 BasicDBObject theMatch = new BasicDBObject("$match", new BasicDBObject("filter", true));

// Add stages to piepline
BasicDBList pipeline = new BasicDBList();
pipeline.add(theProject);
pipeline.add(theMatch);

// Run aggregation
AggregateIterable iterable = collection.aggregate(pipeline);

从Pandas Series / Dataframe中减去聚合

3 个答案: