创建数据框结构

时间:2020-01-17 12:08:37

标签: python python-3.x pandas dataframe

我有一个这样的数据框:

id|c1|c2|c3|c4...
0|s:1,g:B,r:2|s:2,g:A,r:3|s:1,g:C,r:4|s:3,g:D,r:2.....
1|NaN|s:2;g:E,r:4|s:3;g:C,r:3|s:3;g:F,r:3.....

我想像这样重新排列数据框:

id|c|s|g|r
0|c1|1|B|2
0|c2|2|A|3
0|c3|1|C|4
0|c4|3|D|2
1|c1|NaN|NaN|NaN
1|c2|2|E|4
1|c3|3|C|3
1|c4|3|F|3

我尝试了以下操作:

df.melt()

3 个答案:

答案 0 :(得分:5)

DataFrame.set_indexDataFrame.stack重塑思想,并用空列名function colorReplace() { var spread = SpreadsheetApp.openByUrl('SPREADSHEET_URL'); var doc = spread.getSheetByName("DASHBOARD_NAME"); var settings = spread.getSheetByName("Settings"); // get all the existing active sheet background colours var cells = doc.getRange(1, 1, doc.getLastRow(), doc.getLastColumn()).getBackgrounds(); var rows = cells.length; var cols = cells[0].length; var primary = doc.getRange('E2').getBackground(); // Get background of ref cell var primaryReplace = settings.getRange('B2').getValue(); // Get background from cell in settings var border = settings.getRange('B5').getValue(); // Get border colour from cell in settings //Logger.log(primary); //Logger.log(primaryReplace); Boolean for those cells which have a border // iterate accross for (var i = 0; i < rows; i++){ for (var j = 0; j < cols; j++){ if (cells[i][j] == primary && cells[i][j] == check ){ // if cells equal cell colour and they have a border. Possibly could just change this to if cells have a border? cells[i][j] = primaryReplace; // Cell Colour Change } } } // update backgound colours doc.getRange(1, 1, doc.getLastRow(), doc.getLastColumn()).setBackgrounds(cells); } 替换缺失值,然后用s,g,r;替换Series.str.split ,再次重塑,然后被,分割,最后被Series.unstack重塑:

:

编辑:第一步由索引为df1 = (df.set_index('id') .fillna('s,g,r') .stack() .str.split(',|;', expand=True) .stack() .str.split(':', expand=True) .reset_index(level=2, drop=True) .set_index(0, append=True)[1] .unstack() .rename_axis(('id','c')) .rename_axis(None, axis=1) .reset_index() ) print (df1) id c g r s 0 0 c1 B 2 1 1 0 c2 A 3 2 2 0 c3 C 4 1 3 0 c4 D 2 3 4 1 c1 None None None 5 1 c2 E 4 2 6 1 c3 C 3 3 7 1 c4 F 3 3 的{​​{1}}重塑:

stack

下一步是用分隔符吐出,然后再用id重新塑形:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack())
id    
0   c1    s:1,g:B,r:2
    c2    s:2,g:A,r:3
    c3    s:1,g:C,r:4
    c4    s:3,g:D,r:2
1   c1          s,g,r
    c2    s:2;g:E,r:4
    c3    s:3;g:C,r:3
    c4    s:3;g:F,r:3
dtype: object

然后将stack分成2列,并将第一列转换为print (df.set_index('id') .fillna('s,g,r') .stack() .str.split(',|;', expand=True) .stack()) id 0 c1 0 s:1 1 g:B 2 r:2 c2 0 s:2 1 g:A 2 r:3 c3 0 s:1 1 g:C 2 r:4 c4 0 s:3 1 g:D 2 r:2 1 c1 0 s 1 g 2 r c2 0 s:2 1 g:E 2 r:4 c3 0 s:3 1 g:C 2 r:3 c4 0 s:3 1 g:F 2 r:3 dtype: object 的最后一级:

:

最后一次通过MultiIndex重塑:

print (df.set_index('id')
         .fillna('s,g,r')
         .stack()
         .str.split(',|;', expand=True)
         .stack()
         .str.split(':', expand=True)
         .reset_index(level=2, drop=True)
         .set_index(0, append=True)[1])
id      0
0   c1  s       1
        g       B
        r       2
    c2  s       2
        g       A
        r       3
    c3  s       1
        g       C
        r       4
    c4  s       3
        g       D
        r       2
1   c1  s    None
        g    None
        r    None
    c2  s       2
        g       E
        r       4
    c3  s       3
        g       C
        r       3
    c4  s       3
        g       F
        r       3

答案 1 :(得分:2)

explodestackseries.str.split一起使用

df = df.set_index('id')
(df.stack(dropna=False).str.split(',|;').explode().str.split(':',expand=True)
.set_index(0,append=True)[1].unstack().dropna(how='all',axis=1)
.rename_axis(['id','C']).reset_index())

0  id   C    g    r    s
0   0  c1    B    2    1
1   0  c2    A    3    2
2   0  c3    C    4    1
3   0  c4    D    2    3
4   1  c1  NaN  NaN  NaN
5   1  c2    E    4    2
6   1  c3    C    3    3
7   1  c4    F    3    3

答案 2 :(得分:1)

我会建议

s=df.melt('id')
s.loc[s.value.notna(),'value']=[dict(item.split(":") for item in x.replace(';',',').split(",")) for x in s.value.dropna()]
s=s.join(pd.DataFrame(s.value.dropna().tolist(),index=s.dropna().index))