如何将Python源代码拆分为令牌?

时间:2019-05-20 20:03:57

标签: python tokenize

我的源代码如下:

model_stateless.fit(x_train,
                    y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test),
                    shuffle=False)

print('Predicting')
predicted_stateless = model_stateless.predict(x_test, batch_size=batch_size)

def create_model(stateful):
    model = Sequential()
    model.add(LSTM(20,
                   input_shape=(lahead, 1),
                   batch_size=batch_size,
                   stateful=stateful))
    model.add(Dense(1))
    model.compile(loss='mse', optimizer='adam')
    return model

我想对此进行标记化。如果我直接输入filecontents.split(' '),则会将逗号,括号等分组。

理想情况下,我想对此进行标记化: model_stateless.fit(x_train, y_train,batch_size=batch_size,

1 个答案:

答案 0 :(得分:2)

使用tokenize模块:

❯ python3 -m tokenize -e file.py
0,0-0,0:            ENCODING       'utf-8'        
1,0-1,3:            NAME           'def'          
1,4-1,16:           NAME           'create_model' 
1,16-1,17:          LPAR           '('            
1,17-1,25:          NAME           'stateful'     
1,25-1,26:          RPAR           ')'            
1,26-1,27:          COLON          ':'            
1,27-1,28:          NEWLINE        '\n'           
2,0-2,4:            INDENT         '    '         
2,4-2,9:            NAME           'model'        
2,10-2,11:          EQUAL          '='            
2,12-2,22:          NAME           'Sequential'   
2,22-2,23:          LPAR           '('            
2,23-2,24:          RPAR           ')'            
2,24-2,25:          NEWLINE        '\n'           
3,4-3,9:            NAME           'model'        
3,9-3,10:           DOT            '.'            
3,10-3,13:          NAME           'add'          
3,13-3,14:          LPAR           '('            
3,14-3,18:          NAME           'LSTM'         
3,18-3,19:          LPAR           '('            
3,19-3,21:          NUMBER         '20'           
3,21-3,22:          COMMA          ','            
3,22-3,23:          NL             '\n'           
4,19-4,30:          NAME           'input_shape'  
4,30-4,31:          EQUAL          '='            
4,31-4,32:          LPAR           '('            
4,32-4,38:          NAME           'lahead'       
4,38-4,39:          COMMA          ','            
4,40-4,41:          NUMBER         '1'            
4,41-4,42:          RPAR           ')'            
4,42-4,43:          COMMA          ','            
4,43-4,44:          NL             '\n'           
5,19-5,29:          NAME           'batch_size'   
5,29-5,30:          EQUAL          '='            
5,30-5,40:          NAME           'batch_size'   
5,40-5,41:          COMMA          ','            
5,41-5,42:          NL             '\n'           
6,19-6,27:          NAME           'stateful'     
6,27-6,28:          EQUAL          '='            
6,28-6,36:          NAME           'stateful'     
6,36-6,37:          RPAR           ')'            
6,37-6,38:          RPAR           ')'            
6,38-6,39:          NEWLINE        '\n'           
7,4-7,9:            NAME           'model'        
7,9-7,10:           DOT            '.'            
7,10-7,13:          NAME           'add'          
7,13-7,14:          LPAR           '('            
7,14-7,19:          NAME           'Dense'        
7,19-7,20:          LPAR           '('            
7,20-7,21:          NUMBER         '1'            
7,21-7,22:          RPAR           ')'            
7,22-7,23:          RPAR           ')'            
7,23-7,24:          NEWLINE        '\n'           
8,4-8,9:            NAME           'model'        
8,9-8,10:           DOT            '.'            
8,10-8,17:          NAME           'compile'      
8,17-8,18:          LPAR           '('            
8,18-8,22:          NAME           'loss'         
8,22-8,23:          EQUAL          '='            
8,23-8,28:          STRING         "'mse'"        
8,28-8,29:          COMMA          ','            
8,30-8,39:          NAME           'optimizer'    
8,39-8,40:          EQUAL          '='            
8,40-8,46:          STRING         "'adam'"       
8,46-8,47:          RPAR           ')'            
8,47-8,48:          NEWLINE        '\n'           
9,4-9,10:           NAME           'return'       
9,11-9,16:          NAME           'model'        
9,16-9,17:          NEWLINE        '\n'           
10,0-10,0:          DEDENT         ''             
10,0-10,0:          ENDMARKER      ''