我有一个这样的数据框:
id|sub1 |sub2 (header)
1|Rating:2,Grade:C,Semester:3 |Rating:1,Grade:A,Semester:2
2|Rating:3,Grade:A,Semester:2 |Rating:2,Grade:B,Semester:1
我希望它像这样:
id|sem|sub|grade|rating
1|3|sub1|C|2
1|2|sub2|A|1
2|2|sub1|A|3
2|1|sub2|B|2
我尝试过:
df.transpose()
您能提出一种更好的方法吗?
答案 0 :(得分:2)
这是我用'melt'和'extractall'的解决方案:
df:
id sub1 sub2
0 1 Rating:2,Grade:C,Semester:3 Rating:1,Grade:A,Semester:2
1 2 Rating:3,Grade:A,Semester:2 Rating:2,Grade:B,Semester:1
df= df.melt(id_vars="id",var_name="sub")
id sub value
0 1 sub1 Rating:2,Grade:C,Semester:3
1 2 sub1 Rating:3,Grade:A,Semester:2
2 1 sub2 Rating:1,Grade:A,Semester:2
3 2 sub2 Rating:2,Grade:B,Semester:1
df2= df["value"].str.extractall(r":(\d+|\w)").unstack()
0
match 0 1 2
0 2 C 3
1 3 A 2
2 1 A 2
3 2 B 1
df2.columns=["rating","grade","sem"]
dfrslt= pd.concat([df.drop(columns="value"),df2],axis=1) \
.reindex(["id","sem","sub","grade","rating"],axis=1) \
.sort_values("id")
dfrslt:
id sem sub grade rating
0 1 3 sub1 C 2
2 1 2 sub2 A 1
1 2 2 sub1 A 3
3 2 1 sub2 B 2
答案 1 :(得分:1)
我们可以利用一些正则表达式和赋值
public bool IsValidName(String input){
if(String.IsNullOrWhiteSpace(input)){
return false;
}
foreach (char current in input){
int ignore;
//TryPrase will not take Chars, but turning it into a string should be this easy
String currentString = current.ToString();
if(Int32.TryParse(currentString, out ignore))
return false;
}
//You only get here if none of hte false cases was trigerred
return true;
}
pat = (r'Rating:(\d{1})\W+Grade:(\w{1})\W+Semester:(\d{1})')
df.set_index('id',inplace=True)
a = df.sub1.str.extract(pat)
b = df['sub2 (header)'].str.extract(pat)
a['sub'] = 'sub1'
b['sub'] = 'sub2'
df_new = pd.concat([a,b])
df_new.rename(columns={0 : 'Rating', 1 : 'Grade', 2 : 'Semester'},inplace=True)