我有一个如下所示的pandas数据框:
+-----+---+---+--+--+
| | A | B | | |
+-----+---+---+--+--+
| 288 | 1 | 4 | | |
+-----+---+---+--+--+
| 245 | 2 | 3 | | |
+-----+---+---+--+--+
| 543 | 3 | 6 | | |
+-----+---+---+--+--+
| 867 | 1 | 9 | | |
+-----+---+---+--+--+
| 345 | 2 | 7 | | |
+-----+---+---+--+--+
| 122 | 3 | 8 | | |
+-----+---+---+--+--+
| 233 | 1 | 1 | | |
+-----+---+---+--+--+
| 346 | 2 | 6 | | |
+-----+---+---+--+--+
| 765 | 3 | 3 | | |
+-----+---+---+--+--+
我想要做的是从'A'栏中的1到3范围给出'B'栏中的最大值和最小值
例如:
loop on A in range 1 to 3:
get max and min values from column 'B'
max = 6
min = 3
loop on the next range of A from 1 to 3:
get max and min values from column 'B'
max = 9
min = 7
loop on the next range of A from 1 to 3:
get max and min values from column 'B'
max = 6
min = 1
并将最小值最大值添加到如下列:
+-----+---+---+--+----+
| | A | B |min|max|
+-----+---+---+--+----+
| 288 | 1 | 4 | 3 | 6 |
+-----+---+---+--+----+
| 245 | 2 | 3 | | |
+-----+---+---+--+----+
| 543 | 3 | 6 | | |
+-----+---+---+--+----+
| 867 | 1 | 9 | 7 | 9 |
+-----+---+---+--+----+
| 345 | 2 | 7 | | |
+-----+---+---+--+----+
| 122 | 3 | 8 | | |
+-----+---+---+--+----+
| 233 | 1 | 1 | 1 | 6 |
+-----+---+---+--+----+
| 346 | 2 | 6 | | |
+-----+---+---+--+----+
| 765 | 3 | 3 | | |
+-----+---+---+--+----+
答案 0 :(得分:4)
如果不需要空值:
g = df.groupby(np.arange(len(df.index)) // 3)
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
A B min max
288 1 4 3 6
245 2 3 3 6
543 3 6 3 6
867 1 9 7 9
345 2 7 7 9
122 3 8 7 9
233 1 1 1 6
346 2 6 1 6
765 3 3 1 6
对于emty值,可以添加空格,但是,min
和max
列中的所有值也会转换为字符串:
g = df.groupby(np.arange(len(df.index)) // 3)
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
df.loc[df.A != 1, ['min','max']] = ''
print (df)
A B min max
288 1 4 3 6
245 2 3
543 3 6
867 1 9 7 9
345 2 7
122 3 8
233 1 1 1 6
346 2 6
765 3 3
EDIT1:
df['range']='range' + pd.Series(np.arange(len(df.index))//3 + 1, index=df.index).astype(str)
g = df.groupby('range')
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
A B range min max
288 1 4 range1 3 6
245 2 3 range1 3 6
543 3 6 range1 3 6
867 1 9 range2 7 9
345 2 7 range2 7 9
122 3 8 range2 7 9
233 1 1 range3 1 6
346 2 6 range3 1 6
765 3 3 range3 1 6
另一个使用布尔掩码cumsum
的解决方案:
df['range'] = 'range' + (df.A == 1).cumsum().astype(str)
g = df.groupby('range')
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
A B range min max
288 1 4 range1 3 6
245 2 3 range1 3 6
543 3 6 range1 3 6
867 1 9 range2 7 9
345 2 7 range2 7 9
122 3 8 range2 7 9
233 1 1 range3 1 6
346 2 6 range3 1 6
765 3 3 range3 1 6
答案 1 :(得分:2)
一般解决方案
g = df.groupby(df.groupby('A').cumcount())
df['min'] = g.B.transform('min')
df['max'] = g.B.transform('max')
print (df)
A B min max
288 1 4 3 6
245 2 3 3 6
543 3 6 3 6
867 1 9 7 9
345 2 7 7 9
122 3 8 7 9
233 1 1 1 6
346 2 6 1 6
765 3 3 1 6