熊猫-按一列分组,按另一列排序,从第三列获取价值

时间:2019-05-31 16:58:15

标签: python pandas sorting group-by

我想获取pandas数据框,按一列对其进行分组,按另一列对其进行排序,然后从第三列中获取第一个元素,然后填充原始数据框。

这是我原来的df。我要按col_1分组,按col_2排序(升序),并从col_3中获取第一个元素,然后将结果填充到col_4中。

df_in = pd.DataFrame({'col_1':['A', 'A', 'A', 'B', 'B', 'B'], 'col_2': [5,9,2, 3,7,1],
                   'col_3': ['c','d','k','n','l','f']})

[original_df[1]

以下是输出df的外观:

df_out = pd.DataFrame({'col_1':['A', 'A', 'A', 'B', 'B', 'B'], 'col_2': [5,9,2, 3,7,1],
                   'col_3': ['c','d','k','n','l','f'], 'col_4': ['k','k','k','f','f','f'], })

final_df

我可以通过分组和变换来完成分组和排序,但是如何提取第一个元素尚不清楚。

抱歉,无法正确显示图像;-(

4 个答案:

答案 0 :(得分:2)

 df['col_4']=df.sort_values(['col_1','col_2']).groupby('col_1')['col_3'].transform(lambda x: x.iloc[0])

输出:

  col_1  col_2 col_3 col_4
0     A      5     c     k
1     A      9     d     k
2     A      2     k     k
3     B      3     n     f
4     B      7     l     f
5     B      1     f     f

答案 1 :(得分:1)

尝试一下,假设您的索引如图所示,

schema_migrations

输出:

@tf.function
def model(patches):
    conv1 = layers.Conv2D(32, (3, 3), padding='same',name = 'conv1')(patches)
    max_pool1 = layers.MaxPooling2D(name = 'max_pool1')(conv1)
    conv2 = layers.Conv2D(32, (3, 3), padding='same',name = 'conv2')(max_pool1)
    max_pool2 = layers.MaxPooling2D(name = 'max_pool1')(conv2)
    return max_pool2

@tf.function
def extractPatches(x,locations):
    list_patches = []

    for j in range(const.num_patches):
        location = locations[:, j, :]
        location = tf.keras.backend.reverse(location,[1])
        patch_one = tf.image.extract_glimpse(x, [const.size_patch[0],const.size_patch[1]], location, centered=False, normalized=True, noise='zero')
        list_patches.append(patch_one)

    patches = tf.keras.backend.stack(list_patches,axis=1)
    return patches

@tf.function
def createModel(x,y_inits):

    dxs = []
    dx = tf.keras.backend.zeros((const.batch_size, const.num_patches, 2))
    hidden_state = tf.keras.backend.zeros((const.batch_size, const.numNeurons))

    for i in range(const.num_iters_rnn):

        patches = extractPatches(x, y_inits+dx)
        patches = layers.Flatten()(patches)

        concat = tf.keras.backend.concatenate([patches, hidden_state])

        hidden_state=layers.Dense(const.numNeurons,activation='tanh')(concat)
        prediction = layers.Dense(const.num_patches * 2, activation=None)(hidden_state)

        prediction =  layers.Reshape((const.num_patches, 2),name='reshape_prediction')(prediction)

        dx = tf.keras.backend.update_add(dx,prediction)

        dxs.append(dx + y_inits)

    return dx + y_inits,dxs

答案 2 :(得分:1)

您可以使用

first_values = df_in.sort_values(['col_1','col_2']).groupby('col_1')['col_3'].first().rename('col_4')
df_in = df_in.join(first_values, on='col_1')

输出:

  col_1  col_2 col_3 col_4
0     A      5     c     k
1     A      9     d     k
2     A      2     k     k
3     B      3     n     f
4     B      7     l     f
5     B      1     f     f

答案 3 :(得分:1)

尝试比较diff idxmin

s=df_in.groupby(['col_1']).col_2.transform('idxmin')
df_in['New']=df_in.col_3.reindex(s).values
df_in
Out[469]: 
  col_1  col_2 col_3 New
0     A      5     c   k
1     A      9     d   k
2     A      2     k   k
3     B      3     n   f
4     B      7     l   f
5     B      1     f   f