在数据框列中拆分值

时间:2019-07-02 07:29:19

标签: python dataframe split index-error

我有一个数据框名称df,我想删除此'|'在燃料栏中

id  car       fuel
1   Mercedes  petrol|diesel|gas
2   Audi      gas|petrol   

这样我的数据就这样

id  car        fuel
1   Mercedes   petrol
1   Mercedes   diesel
1   Mercedes   gas
2   Audi       gas
2   Audi       petrol

这是我尝试过的代码

df_1 = hb.copy()
df_2 = hb.copy()
df_3 = hb.copy()

df_1['fuel'] = df_1['fuel'].apply(lambda x:x.split('|')[0])
df_2['fuel'] = df_2['fuel'].apply(lambda x:x.split('|')[1])
df_3['fuel'] = df_3['fuel'].apply(lambda x:x.split('|')[2])

这会给IndexError:列表索引超出范围

3 个答案:

答案 0 :(得分:1)

尝试一下:

    df=pd.DataFrame({'car':['Mercedes','Audi'],'fuel':['petrol|diesel|gas','gas|petrol']}) #your dataframe
    df2=pd.DataFrame()                                       #new black dataframe
    for i in range(0,len(df)):                               #iterating over df
        list1=df.iloc[i,1].split('|')                        #split each value of 'fuel' and store it in a list
        for j in range(0,len(list1)):                        #iterating over list1
            list2={'car':df.iloc[i,0],'fuel':list1[j]}       #make a dict of each combination of 'car' and elements of list1-'fuel'
            df2=df2.append(list2,ignore_index=True)          #append each value to the blank df

答案 1 :(得分:1)

这是一种方法。

例如:

df = pd.DataFrame({
        "id":[1,2],
        "car":["Mercedes","Audi"],
        "fuel":["petrol|diesel|gas","gas|petrol"]
        })
df["fuel"] = df["fuel"].str.split("|")
#Ref https://stackoverflow.com/a/48532692/532312
lst_col = 'fuel'
df = pd.DataFrame({
      col:np.repeat(df[col].values, df[lst_col].str.len())
      for col in df.columns.drop(lst_col)}
    ).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
print(df)

输出:

        car    fuel  id
0  Mercedes  petrol   1
1  Mercedes  diesel   1
2  Mercedes     gas   1
3      Audi     gas   2
4      Audi  petrol   2

答案 2 :(得分:0)

您可以尝试以下操作:

#Create the dataframe
df = pd.DataFrame({
        "id":[1,2],
        "car":["Mercedes","Audi"],
        "fuel":["petrol|diesel|gas","gas|petrol"]
        })

#Create a new dataframe from the series, with car as the index
new_df = pd.DataFrame(df.fuel.str.split('|').tolist(), index=df.car).stack()

#Get rid of the secondary index
new_df = new_df.reset_index([0, 'car'])

#Add the 'id' back to the dataframe
#Note: There is probably a much more elegant way of doing this
new_df.loc[:,'id'] = new_df.car.apply(lambda x: df[df.loc[:,'car'] == x].id.values[0])

#Rename the columns
new_df.columns = ["car","fuel","id"]