我有一个数据框名称df,我想删除此'|'在燃料栏中
id car fuel
1 Mercedes petrol|diesel|gas
2 Audi gas|petrol
这样我的数据就这样
id car fuel
1 Mercedes petrol
1 Mercedes diesel
1 Mercedes gas
2 Audi gas
2 Audi petrol
这是我尝试过的代码
df_1 = hb.copy()
df_2 = hb.copy()
df_3 = hb.copy()
df_1['fuel'] = df_1['fuel'].apply(lambda x:x.split('|')[0])
df_2['fuel'] = df_2['fuel'].apply(lambda x:x.split('|')[1])
df_3['fuel'] = df_3['fuel'].apply(lambda x:x.split('|')[2])
这会给IndexError:列表索引超出范围
答案 0 :(得分:1)
尝试一下:
df=pd.DataFrame({'car':['Mercedes','Audi'],'fuel':['petrol|diesel|gas','gas|petrol']}) #your dataframe
df2=pd.DataFrame() #new black dataframe
for i in range(0,len(df)): #iterating over df
list1=df.iloc[i,1].split('|') #split each value of 'fuel' and store it in a list
for j in range(0,len(list1)): #iterating over list1
list2={'car':df.iloc[i,0],'fuel':list1[j]} #make a dict of each combination of 'car' and elements of list1-'fuel'
df2=df2.append(list2,ignore_index=True) #append each value to the blank df
答案 1 :(得分:1)
这是一种方法。
例如:
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)
输出:
car fuel id
0 Mercedes petrol 1
1 Mercedes diesel 1
2 Mercedes gas 1
3 Audi gas 2
4 Audi petrol 2
答案 2 :(得分:0)
您可以尝试以下操作:
#Create the dataframe
df = pd.DataFrame({
"id":[1,2],
"car":["Mercedes","Audi"],
"fuel":["petrol|diesel|gas","gas|petrol"]
})
#Create a new dataframe from the series, with car as the index
new_df = pd.DataFrame(df.fuel.str.split('|').tolist(), index=df.car).stack()
#Get rid of the secondary index
new_df = new_df.reset_index([0, 'car'])
#Add the 'id' back to the dataframe
#Note: There is probably a much more elegant way of doing this
new_df.loc[:,'id'] = new_df.car.apply(lambda x: df[df.loc[:,'car'] == x].id.values[0])
#Rename the columns
new_df.columns = ["car","fuel","id"]