我想获取pandas数据框,按一列对其进行分组,按另一列对其进行排序,然后从第三列中获取第一个元素,然后填充原始数据框。
这是我原来的df。我要按col_1分组,按col_2排序(升序),并从col_3中获取第一个元素,然后将结果填充到col_4中。
df_in = pd.DataFrame({'col_1':['A', 'A', 'A', 'B', 'B', 'B'], 'col_2': [5,9,2, 3,7,1],
'col_3': ['c','d','k','n','l','f']})
[
以下是输出df的外观:
df_out = pd.DataFrame({'col_1':['A', 'A', 'A', 'B', 'B', 'B'], 'col_2': [5,9,2, 3,7,1],
'col_3': ['c','d','k','n','l','f'], 'col_4': ['k','k','k','f','f','f'], })
我可以通过分组和变换来完成分组和排序,但是如何提取第一个元素尚不清楚。
抱歉,无法正确显示图像;-(
答案 0 :(得分:2)
df['col_4']=df.sort_values(['col_1','col_2']).groupby('col_1')['col_3'].transform(lambda x: x.iloc[0])
输出:
col_1 col_2 col_3 col_4
0 A 5 c k
1 A 9 d k
2 A 2 k k
3 B 3 n f
4 B 7 l f
5 B 1 f f
答案 1 :(得分:1)
尝试一下,假设您的索引如图所示,
schema_migrations
输出:
@tf.function
def model(patches):
conv1 = layers.Conv2D(32, (3, 3), padding='same',name = 'conv1')(patches)
max_pool1 = layers.MaxPooling2D(name = 'max_pool1')(conv1)
conv2 = layers.Conv2D(32, (3, 3), padding='same',name = 'conv2')(max_pool1)
max_pool2 = layers.MaxPooling2D(name = 'max_pool1')(conv2)
return max_pool2
@tf.function
def extractPatches(x,locations):
list_patches = []
for j in range(const.num_patches):
location = locations[:, j, :]
location = tf.keras.backend.reverse(location,[1])
patch_one = tf.image.extract_glimpse(x, [const.size_patch[0],const.size_patch[1]], location, centered=False, normalized=True, noise='zero')
list_patches.append(patch_one)
patches = tf.keras.backend.stack(list_patches,axis=1)
return patches
@tf.function
def createModel(x,y_inits):
dxs = []
dx = tf.keras.backend.zeros((const.batch_size, const.num_patches, 2))
hidden_state = tf.keras.backend.zeros((const.batch_size, const.numNeurons))
for i in range(const.num_iters_rnn):
patches = extractPatches(x, y_inits+dx)
patches = layers.Flatten()(patches)
concat = tf.keras.backend.concatenate([patches, hidden_state])
hidden_state=layers.Dense(const.numNeurons,activation='tanh')(concat)
prediction = layers.Dense(const.num_patches * 2, activation=None)(hidden_state)
prediction = layers.Reshape((const.num_patches, 2),name='reshape_prediction')(prediction)
dx = tf.keras.backend.update_add(dx,prediction)
dxs.append(dx + y_inits)
return dx + y_inits,dxs
答案 2 :(得分:1)
您可以使用
first_values = df_in.sort_values(['col_1','col_2']).groupby('col_1')['col_3'].first().rename('col_4')
df_in = df_in.join(first_values, on='col_1')
输出:
col_1 col_2 col_3 col_4
0 A 5 c k
1 A 9 d k
2 A 2 k k
3 B 3 n f
4 B 7 l f
5 B 1 f f
答案 3 :(得分:1)
尝试比较diff idxmin
s=df_in.groupby(['col_1']).col_2.transform('idxmin')
df_in['New']=df_in.col_3.reindex(s).values
df_in
Out[469]:
col_1 col_2 col_3 New
0 A 5 c k
1 A 9 d k
2 A 2 k k
3 B 3 n f
4 B 7 l f
5 B 1 f f