Question

我正在创建一个新列，并尝试将列值相同的行连接起来。 1第一行将具有该行的初始值，第二行将具有第一行和第二行的值。我已经能够在列具有两个值的地方使用它，如果列具有3个或更多的值，则在最后一行中只能串联两个值。

data={ 'Fruit':['Apple','Apple','Mango','Mango','Mango','Watermelon'],
'Color':['Red','Green','Yellow','Green','Orange','Green']
}
df = pd.DataFrame(data)
df['length']=df['Fruit'].str.len()
df['Fruit_color']=df['Fruit']+df['length'].map(lambda x: ' '*x)
df['same_fruit']=np.where(df['Fruit']!=df['Fruit'].shift(1),df['Fruit_color'],df['Fruit_color'].shift(1)+" "+df['Fruit_color]

当前输出：

我如何获得预期的输出。

下面是我期望的输出

关于，仁。

Answer 1

这是一个答案：

In [1]:
import pandas as pd
data={ 'Fruit':['Apple','Apple','Mango','Mango','Mango','Watermelon'],
'Color':['Red','Green','Yellow','Green','Orange','Green']
}
df = pd.DataFrame(data)
df['length']=df['Fruit'].str.len()
df['Fruit_color']=df['Fruit'] + ' ' + df['Color']
df.sort_values(by=['Fruit_color'], inplace=True)


## Get the maximum of fruit occurrence
maximum = df[['Fruit', 'Color']].groupby(['Fruit']).count().max().tolist()[0]

## Iter shift as many times as the highest occurrence 
new_cols = []
for i in range(maximum):
    temporary_col = 'Fruit_' + str(i)
    df[temporary_col] = df['Fruit'].shift(i+1)

    new_col = 'new_col_' + str(i)
    df[new_col] = df['Fruit_color'].shift(i+1)

    df.loc[df[temporary_col] != df['Fruit'], new_col] = ''

    df.drop(columns=[temporary_col], axis=1, inplace=True)
    new_cols.append(new_col)

## Use this shifted columns to create `same fruit` and drop useless columns
df['same_fruit'] = df['Fruit_color']
for col in new_cols:
    df['same_fruit'] = df['same_fruit'] + ' ' +  df[col]
    df.drop(columns=[col], axis=1, inplace=True)

Out [1]:

    Fruit       Color   length  Fruit_color         same_fruit
1   Apple       Green   5       Apple Green         Apple Green
0   Apple       Red     5       Apple Red           Apple Red Apple Green
3   Mango       Green   5       Mango Green         Mango Green
4   Mango       Orange  5       Mango Orange        Mango Orange Mango Green
2   Mango       Yellow  5       Mango Yellow        Mango Yellow Mango Orange Mango Green
5   Watermelon  Green   10      Watermelon Green    Watermelon Green

根据新列的下一行中的相同值连接两行

1 个答案: