替换字符串中的特殊模式,从文件中读取

时间:2014-07-30 16:25:01

标签: python regex string

我试图用标签替换字符串中的特殊模式。这个字符串(如果我可以称之为)是读取文件的结果,该文件具有重音符号(我是葡萄牙语,因此UTF-8或LATIN-1是编码语言)。 所以想象一下我的意见是:

Aubrecht, Christoph; Özceylan, Aubrecht Dilek; Klerx, Joachim; Freire, Sérgio (2013) “Future-oriented activities as a concept for improved disaster risk management. Disaster Advances”, 6(12), 1-10. (IF = 2.272) E-ISSN 2278-4543. REVISTA INDEXADA NO WEB OF SCIENCE 

Aubrecht, Christoph; Özceylan, Dilek; Steinnocher, Klaus; Freire, Sérgio (2013), “Multi-level geospatial modeling of human exposure patterns and vulnerability indicators”. Natural Hazards, 68:147-163. (IF = 1.639).. ISSN: 0921-030X (print version). ISSN: 1573-0840 (electronic version. Accession Number: WOS:000322724000008 

其中一些特殊模式是:

') "'    --> '\t'
'), "'   --> '\t'
'),"'    --> '\t'
') "'    --> '\t'
'),«'    --> '\t'
'), «'   --> '\t'
') "'    --> '\t'

到目前为止,我已尝试使用字典替换所有这些字符,但碰巧字典无法识别其中一些模式。我知道re.sub函数是" man"对于这个(python replace space with special characters between strings)但是当你有一个预定义的字符串时很酷,但是当你从文件中读取时,你是如何做到的?

我的代码:

# -*- coding: utf-8 -*-


import Tkinter as tk
import codecs, string, sys, re

root = tk.Tk()
root.title("Final?")

f = open('INPUT TEXT', 'r')
with codecs.open('INPUT TEXT', encoding='latin1') as f:
    sentence = f.read()
    if isinstance(sentence, unicode):
     sentence = sentence.encode('latin1')


def results1():
 print '\n', sentence

print results1, '\n'

key = {0:') "', 1:'replace'}
regx = re.compile('\t\t{[0]}\t\t'.format(key))
print( regx.sub(key[1],results1) )




def replace_all(text, dic):
 for i, j in dic.iteritems():
    text = text.replace(i,j)
 return text

reps = {' (':'\t', ') "':'\t', '), "':'\t', '),"':'\t', ') "':'\t', '),«':'\t', '), «':'\t', ') "':'\t', 'p.':'\t', ',':' '}
converts = replace_all(sentence, reps)

def converts():
 sys.stdout = open('output.txt', 'w')
 converts = replace_all(sentence, reps)
 print '\n', converts



 results = tk.Button(root, text='Resultados', width=25, command=resultadosnormais)
 results.pack()
 txt = tk.Button(root, text='Conversor resultados', width=25, command=conversortexto)
 txt.pack()

 root.mainloop()

我也看过这篇文章,但似乎无法将其应用于我的代码中:Re.sub not working for me

但不知何故,它将函数存储在某处,但在此之后它会发出错误:

File "C:\Users\Joao\Desktop\Tryout2.py", line 30, in <module>
regx = re.compile('\t\t{[0]}\t\t'.format(key))
error: unbalanced parenthesis

0 个答案:

没有答案