对于Python Pandas:
我想简化我的代码 - 所以它最终是一个单行(原因:性能优化)。
我如何编写它以便我只有一行包含groupby
语句?
类似的东西:
dfResult = df2.groupby("a").something().I()Do()Not()Understand()Yet()
这是我的代码(我想过滤掉a
列b
之间标准偏差太大的列import pandas as pd
dfResult = pd.DataFrame()
df2 = pd.DataFrame({'a': ("w", "w", "w", "w", "x", "x", "x"), 'b': (30, 42, 54, 68, 7, 8, 65)})
print('input data:')
print(df2)
dfGroupBy = df2.groupby("a")
for key, item in dfGroupBy:
innerDf = dfGroupBy.get_group(key)
# calculate delta between two rows for column 'b'
innerDf['delta'] = innerDf['b'] - innerDf['b'].shift(1)
# calculate standard deviation (without the first row)
standardDeviation = pd.np.std(innerDf['delta'][1:])
if standardDeviation < 15:
print ("so my standard deviation is small enough!")
print(innerDf['delta'][1:])
print("standard deviation:", standardDeviation)
# remove column 'delta', as I needed it only in between
innerDf = innerDf.drop('delta', axis=1)
dfResult = dfResult.append(innerDf)
print("result:")
print(dfResult)
的这些组:
input data:
a b
0 w 30
1 w 42
2 w 54
3 w 68
4 x 7
5 x 8
6 x 65
so my standard deviation is small enough!
1 12.0
2 12.0
3 14.0
Name: delta, dtype: float64
standard deviation: 0.942809041582
result:
a b
0 w 30
1 w 42
2 w 54
3 w 68
这是控制台输出:
{{1}}