我正在创建一个新列,并尝试将列值相同的行连接起来。 1第一行将具有该行的初始值,第二行将具有第一行和第二行的值。我已经能够在列具有两个值的地方使用它,如果列具有3个或更多的值,则在最后一行中只能串联两个值。
data={ 'Fruit':['Apple','Apple','Mango','Mango','Mango','Watermelon'],
'Color':['Red','Green','Yellow','Green','Orange','Green']
}
df = pd.DataFrame(data)
df['length']=df['Fruit'].str.len()
df['Fruit_color']=df['Fruit']+df['length'].map(lambda x: ' '*x)
df['same_fruit']=np.where(df['Fruit']!=df['Fruit'].shift(1),df['Fruit_color'],df['Fruit_color'].shift(1)+" "+df['Fruit_color]
当前输出:
我如何获得预期的输出。
下面是我期望的输出
关于, 仁。
答案 0 :(得分:0)
这是一个答案:
In [1]:
import pandas as pd
data={ 'Fruit':['Apple','Apple','Mango','Mango','Mango','Watermelon'],
'Color':['Red','Green','Yellow','Green','Orange','Green']
}
df = pd.DataFrame(data)
df['length']=df['Fruit'].str.len()
df['Fruit_color']=df['Fruit'] + ' ' + df['Color']
df.sort_values(by=['Fruit_color'], inplace=True)
## Get the maximum of fruit occurrence
maximum = df[['Fruit', 'Color']].groupby(['Fruit']).count().max().tolist()[0]
## Iter shift as many times as the highest occurrence
new_cols = []
for i in range(maximum):
temporary_col = 'Fruit_' + str(i)
df[temporary_col] = df['Fruit'].shift(i+1)
new_col = 'new_col_' + str(i)
df[new_col] = df['Fruit_color'].shift(i+1)
df.loc[df[temporary_col] != df['Fruit'], new_col] = ''
df.drop(columns=[temporary_col], axis=1, inplace=True)
new_cols.append(new_col)
## Use this shifted columns to create `same fruit` and drop useless columns
df['same_fruit'] = df['Fruit_color']
for col in new_cols:
df['same_fruit'] = df['same_fruit'] + ' ' + df[col]
df.drop(columns=[col], axis=1, inplace=True)
Out [1]:
Fruit Color length Fruit_color same_fruit
1 Apple Green 5 Apple Green Apple Green
0 Apple Red 5 Apple Red Apple Red Apple Green
3 Mango Green 5 Mango Green Mango Green
4 Mango Orange 5 Mango Orange Mango Orange Mango Green
2 Mango Yellow 5 Mango Yellow Mango Yellow Mango Orange Mango Green
5 Watermelon Green 10 Watermelon Green Watermelon Green