Python - RegEx - 修改文本文件

时间:2009-10-26 10:20:33

标签: python regex

新手到Python ....通过以下任务请求帮助: - )

我有各种文件的树,其中一些是C源代码。 我想用python脚本修改这些C文件。

C代码有4个定义 -

#define ZR_LOG0(Id, Class, Seveity, Format)
#define ZR_LOG1(Id, Class, Seveity, Format, Attr0)
#define ZR_LOG2(Id, Class, Seveity, Format, Attr0, Attr1)
#define ZR_LOG3(Id, Class, Seveity, Format, Attr0, Attr1, Attr2)

在C源代码中有各种ZR_LOGn行。

示例:ZR_LOG1(1,LOG_CLASS_3,LOG_INFO,“hello world%d”,76);

空格(空格,制表符)可能出现在该行的任何位置(字段之间)。

python脚本任务如下:

  1. 使用顺序计数器替换任何“Id”字段(这是一个我们不关心其原始值的整数类型)。 (第一个'LOG'...行我们将遇到'Id'字段将获得值0,下一个1,依此类推)
  2. 在单独的输出文件中,对于每个ZR_LOG行,我们将以格式创建索引行 {NewId,Format}, 以上示例将获得:

    { 0, "hello world %d" },
    
  3. 感谢你的帮助......


    我已经开始使用以下代码,您可以查看它或完全忽略它。

    '''
    Created on Oct 25, 2009
    
    @author: Uri Shkolnik
    
    The following version does find & replace LOG Ids for all 
    C source files in a dir (and below) with sequential counter, 
    The files are assumed to be UTF-8 encoded. 
    (which works fine if they are ASCII, because ASCII is a 
    subset of UTF-8)
    It also assemble new index file, composed from all new IDs and format fields
    
    '''
    
    import os, sys, re, shutil
    
    mydir= '/home/uri/proj1'
    searched_pattern0 = 'ZR_LOG0'
    
    def search_and_replace(filepath):
        ''' replaces all string by a regex substitution '''
        backupName=filepath+'~re~'
    
        print 'reading:', filepath
        input = open(filepath,'rb')
        s=unicode(input.read(),'utf-8')
        input.close()
    
        m = re.match(ur'''[:space:]ZR_LOG[0-3].*\(.*[0-9]{0,10},LOG_''', s)
        print m
    
    def c_files_search(dummy, dirr, filess):
        ''' search directories for file pattern *.c '''
        for child in filess:
            if '.c' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child):
                filepath = dirr+'/'+child
                search_and_replace(filepath)
    
    os.path.walk(mydir, c_files_search, 3)
    

1 个答案:

答案 0 :(得分:1)

几点:

  • 您可以将空格与'\ s'匹配。
  • 正则表达式'捕获组'在这里很有用。

所以,我会做这样的事情:

output = ''
counter = 1
for line in lines:
    # Match only ZR_LOG lines and capture everything surrounding "Id"
    match = re.match('^(.*\sZR_LOG[0-3]\s*\(\s*)'  # group(1), before Id
                     'Id'
                     '(,.*)$',  # group(2), after Id
                     line)
    if match:
        # Add everything before Id, the counter value and everything after Id
        output += match.group(1) + str(counter) + match.group(2)
        counter += 1
        # And do extra logging etc.
    else:
        output += line