完整的数据帧重组

时间:2019-12-21 15:39:12

标签: python pandas dataframe

我有一个这样的数据框:

id|sub1 |sub2  (header)
1|Rating:2,Grade:C,Semester:3   |Rating:1,Grade:A,Semester:2    
2|Rating:3,Grade:A,Semester:2   |Rating:2,Grade:B,Semester:1

我希望它像这样:

id|sem|sub|grade|rating
1|3|sub1|C|2
1|2|sub2|A|1
2|2|sub1|A|3
2|1|sub2|B|2

我尝试过:

df.transpose()

您能提出一种更好的方法吗?

2 个答案:

答案 0 :(得分:2)

这是我用'melt'和'extractall'的解决方案:

df:
    id                         sub1                         sub2
 0   1  Rating:2,Grade:C,Semester:3  Rating:1,Grade:A,Semester:2
 1   2  Rating:3,Grade:A,Semester:2  Rating:2,Grade:B,Semester:1


df= df.melt(id_vars="id",var_name="sub")                                                                            

   id   sub                        value
0   1  sub1  Rating:2,Grade:C,Semester:3
1   2  sub1  Rating:3,Grade:A,Semester:2
2   1  sub2  Rating:1,Grade:A,Semester:2
3   2  sub2  Rating:2,Grade:B,Semester:1

df2= df["value"].str.extractall(r":(\d+|\w)").unstack()

0      
match  0  1  2
0      2  C  3
1      3  A  2
2      1  A  2
3      2  B  1

df2.columns=["rating","grade","sem"] 

dfrslt= pd.concat([df.drop(columns="value"),df2],axis=1) \
        .reindex(["id","sem","sub","grade","rating"],axis=1) \
        .sort_values("id")

 dfrslt:                                                                                                            

   id sem   sub grade rating
0   1   3  sub1     C      2
2   1   2  sub2     A      1
1   2   2  sub1     A      3
3   2   1  sub2     B      2

答案 1 :(得分:1)

我们可以利用一些正则表达式和赋值

public bool IsValidName(String input){
    if(String.IsNullOrWhiteSpace(input)){
        return false;
    }

    foreach (char current in input){
        int ignore;
        //TryPrase will not take Chars, but turning it into a string should be this easy
        String currentString = current.ToString();
        if(Int32.TryParse(currentString, out ignore))
            return false;
    }
    //You only get here if none of hte false cases was trigerred
    return true;   
}

pat = (r'Rating:(\d{1})\W+Grade:(\w{1})\W+Semester:(\d{1})')

df.set_index('id',inplace=True)

a = df.sub1.str.extract(pat)
b = df['sub2  (header)'].str.extract(pat)

a['sub'] = 'sub1'
b['sub'] = 'sub2'

df_new = pd.concat([a,b])

df_new.rename(columns={0 : 'Rating', 1 : 'Grade', 2 : 'Semester'},inplace=True)