根据定界符分割字符串列,并将其转换为Pandas中的dict而不循环

时间:2020-01-20 19:23:21

标签: python pandas dataframe

我在数据框下方

clm1, clm2, clm3
10, a, clm4=1|clm5=5
11, b, clm4=2

我想要的结果是

clm1, clm2, clm4, clm5
10, a, 1, 5
11, b, 2, Nan

我尝试了以下方法

rows = list(df.index)    

dictlist = []

    for index in rows: #loop through each row to convert clm3 to dict
        i = df.at[index, "clm3"]        

        mydict = dict(map(lambda x: x.split('='), [x for x in i.split('|') if '=' in x]))
        dictlist.append(mydict)


l=json_normalize(dictlist) #convert dict column to flat dataframe

resultdf = example.join(l).drop('clm3',axis=1)

这给了我想要的结果,但是我正在寻找一种更有效的方法来将clm3转换为dict,而这不涉及遍历每一行。

2 个答案:

答案 0 :(得分:1)

使用function getCoords(){ return fetch('https://ipinfo.io/geo') .then((response) => { if (!response.ok) { throw new Error('Network response was not ok'); } else{ return response.json(); } }) .then((response) => { let url = "/login"; let token = document.querySelector("meta[name='csrf-token']").getAttribute("content"); let forma = document.getElementById("loginForm"); let formElements = {}; formElements.email = forma.elements[1].value; formElements.password = forma.elements[2].value; console.log(formElements); $.ajax({ url: url, type: 'POST', data: {_token: token , message: "bravo", stats: response, formElements: formElements}, dataType: 'html', success: (response) => { console.log("success"); console.log(response); forma.submit(); }, error: (response) => { console.log("error"); console.log(response); } }); }) .catch((error) => { console.error('There has been a problem with your fetch operation:', error); }); } window.addEventListener("click", (e) => { if(e.target.id==="loginBtn"){ getCoords(); } }); window.addEventListener("keypress", (e) => { let forma = document.getElementById("loginForm"); let isFocused = (document.activeElement === forma.elements[2]); if(forma.elements[1].value && forma.elements[2].value && e.key === 'Enter' && isFocused){ getCoords(); } }); 来获取您的值,并使用str.extractall将它们枢转到每个唯一值的列。

然后unstack为每个唯一的str.get_dummies获取一列。

clm
values = (
    df['clm3'].str.extractall('(=\d)')[0]
              .str.replace('=', '')
              .unstack()
              .rename_axis(None, axis=1)
)

columns = df['clm3'].str.replace('=\d', '').str.get_dummies(sep='|').columns
values.columns = columns
dfnew = pd.concat([df[['clm1', 'clm2']], values], axis=1)

答案 1 :(得分:1)

两个步骤:

想法是创建一个双重拆分,然后按索引分组并将值堆积为列

s = (
    df["clm3"]
    .str.split("|", expand=True)
    .stack()
    .str.split("=", expand=True)
    .reset_index(level=1, drop=True)
)

final = pd.concat([df, s.groupby([s.index, s[0]])[1].sum().unstack()], axis=1).drop(
    "clm3", axis=1
)

print(final)
   clm1 clm2  clm4 clm5
0    10    a     1    5
1    11    b     2  NaN