我有一个这样的数据框:
id|c1|c2|c3|c4...
0|s:1,g:B,r:2|s:2,g:A,r:3|s:1,g:C,r:4|s:3,g:D,r:2.....
1|NaN|s:2;g:E,r:4|s:3;g:C,r:3|s:3;g:F,r:3.....
我想像这样重新排列数据框:
id|c|s|g|r
0|c1|1|B|2
0|c2|2|A|3
0|c3|1|C|4
0|c4|3|D|2
1|c1|NaN|NaN|NaN
1|c2|2|E|4
1|c3|3|C|3
1|c4|3|F|3
我尝试了以下操作:
df.melt()
答案 0 :(得分:5)
用DataFrame.set_index
用DataFrame.stack
重塑思想,并用空列名function colorReplace() {
var spread = SpreadsheetApp.openByUrl('SPREADSHEET_URL');
var doc = spread.getSheetByName("DASHBOARD_NAME");
var settings = spread.getSheetByName("Settings");
// get all the existing active sheet background colours
var cells = doc.getRange(1, 1, doc.getLastRow(), doc.getLastColumn()).getBackgrounds();
var rows = cells.length;
var cols = cells[0].length;
var primary = doc.getRange('E2').getBackground(); // Get background of ref cell
var primaryReplace = settings.getRange('B2').getValue(); // Get background from cell in settings
var border = settings.getRange('B5').getValue(); // Get border colour from cell in settings
//Logger.log(primary);
//Logger.log(primaryReplace);
Boolean for those cells which have a border
// iterate accross
for (var i = 0; i < rows; i++){
for (var j = 0; j < cols; j++){
if (cells[i][j] == primary && cells[i][j] == check ){
// if cells equal cell colour and they have a border. Possibly could just change this to if cells have a border?
cells[i][j] = primaryReplace; // Cell Colour Change
}
}
}
// update backgound colours
doc.getRange(1, 1, doc.getLastRow(), doc.getLastColumn()).setBackgrounds(cells);
}
替换缺失值,然后用s,g,r
或;
替换Series.str.split
,再次重塑,然后被,
分割,最后被Series.unstack
重塑:
:
编辑:第一步由索引为df1 = (df.set_index('id')
.fillna('s,g,r')
.stack()
.str.split(',|;', expand=True)
.stack()
.str.split(':', expand=True)
.reset_index(level=2, drop=True)
.set_index(0, append=True)[1]
.unstack()
.rename_axis(('id','c'))
.rename_axis(None, axis=1)
.reset_index()
)
print (df1)
id c g r s
0 0 c1 B 2 1
1 0 c2 A 3 2
2 0 c3 C 4 1
3 0 c4 D 2 3
4 1 c1 None None None
5 1 c2 E 4 2
6 1 c3 C 3 3
7 1 c4 F 3 3
的{{1}}重塑:
stack
下一步是用分隔符吐出,然后再用id
重新塑形:
print (df.set_index('id')
.fillna('s,g,r')
.stack())
id
0 c1 s:1,g:B,r:2
c2 s:2,g:A,r:3
c3 s:1,g:C,r:4
c4 s:3,g:D,r:2
1 c1 s,g,r
c2 s:2;g:E,r:4
c3 s:3;g:C,r:3
c4 s:3;g:F,r:3
dtype: object
然后将stack
分成2列,并将第一列转换为print (df.set_index('id')
.fillna('s,g,r')
.stack()
.str.split(',|;', expand=True)
.stack())
id
0 c1 0 s:1
1 g:B
2 r:2
c2 0 s:2
1 g:A
2 r:3
c3 0 s:1
1 g:C
2 r:4
c4 0 s:3
1 g:D
2 r:2
1 c1 0 s
1 g
2 r
c2 0 s:2
1 g:E
2 r:4
c3 0 s:3
1 g:C
2 r:3
c4 0 s:3
1 g:F
2 r:3
dtype: object
的最后一级:
:
最后一次通过MultiIndex
重塑:
print (df.set_index('id')
.fillna('s,g,r')
.stack()
.str.split(',|;', expand=True)
.stack()
.str.split(':', expand=True)
.reset_index(level=2, drop=True)
.set_index(0, append=True)[1])
id 0
0 c1 s 1
g B
r 2
c2 s 2
g A
r 3
c3 s 1
g C
r 4
c4 s 3
g D
r 2
1 c1 s None
g None
r None
c2 s 2
g E
r 4
c3 s 3
g C
r 3
c4 s 3
g F
r 3
答案 1 :(得分:2)
将explode
和stack
与series.str.split
一起使用
df = df.set_index('id')
(df.stack(dropna=False).str.split(',|;').explode().str.split(':',expand=True)
.set_index(0,append=True)[1].unstack().dropna(how='all',axis=1)
.rename_axis(['id','C']).reset_index())
0 id C g r s
0 0 c1 B 2 1
1 0 c2 A 3 2
2 0 c3 C 4 1
3 0 c4 D 2 3
4 1 c1 NaN NaN NaN
5 1 c2 E 4 2
6 1 c3 C 3 3
7 1 c4 F 3 3
答案 2 :(得分:1)
我会建议
s=df.melt('id')
s.loc[s.value.notna(),'value']=[dict(item.split(":") for item in x.replace(';',',').split(",")) for x in s.value.dropna()]
s=s.join(pd.DataFrame(s.value.dropna().tolist(),index=s.dropna().index))