Replacing a certain text in SQL using Python

时间:2015-06-30 14:00:18

标签: python sql regex

I am using Python to replace certain string in sql file that I have. The string looks this:

<img title="\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\dpi{50}&amp;space;\fn_phv&amp;space;\frac{3}{8}" alt="" />

Basically it contains a html code for a fraction. But now I would like to replace it with:

<sup>3</sup>&frasl;<sub>8</sub>

To replace it in the sql file, I am using this code in Python,

for line in filedata:
    re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", line)

This doesn't change the data, so I have tried this as well.

filedata1 = re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", filedata)

This also didn't help me. Need some help on this.

My full code:

import re
with open('/Users/cnnlakshmen/Downloads/qz_question.sql', 'r') as fin:
    filedata = fin.read()

for line in filedata:
    re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", line)

filedata1 = re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>&frasl;<sub>\g<2></sub>", filedata)
print filedata1

# Write the file out again
with open('/Users/cnnlakshmen/Downloads/qz_question1.sql', 'w') as fin:
  fin.write(filedata1)

Each data line looks like this:

(163, 'S001', 'T005', 'ST015', 'Medium', '1', 9, '1', '<p>The ratio of the number of children to the number of adults at a funfair was 2 : 5.​&nbsp;&nbsp;<sup>1</sup>&frasl;<sub>5</sub>of the children were boys. If there were 120 more adults than children, how many girls were there at the funfair?</p>\n<p>&nbsp;</p>', 'without_image', '[{"value":"16","answer":"0"},{"value":"40","answer":"0"},{"value":"64","answer":"1"},{"value":"120","answer":"0"}]', '<p>5 -2 = 3</p>\n<p>3 units --&gt; 120</p>\n<p>1 unit --&gt; 120 &divide; 3 = 40</p>\n<p>2 units --&gt; 40 x 2 = 80</p>\n<p>1 - <img title="\\small \\frac{1}{5}" src="http://latex.codecogs.com/gif.latex?\\small&amp;space;\\frac{1}{5}" alt="" width="5" height="20" />&nbsp;=&nbsp;<img title="\\small \\frac{4}{5}" src="http://latex.codecogs.com/gif.latex?\\small&amp;space;\\frac{4}{5}" alt="" width="4" height="16" /></p>\n<p><img title="\\small \\frac{4}{5}" src="http://latex.codecogs.com/gif.latex?\\small&amp;space;\\frac{4}{5}" alt="" width="4" height="16" />&nbsp;x 80 = 64</p>', 'lakshmen K', NULL, '1', '0', '2015-05-03 15:54:19', '0000-00-00 00:00:00'),

2 个答案:

答案 0 :(得分:0)

您的正则表达式无法正常工作,因为您可能认为它正在运行。

>>> a = '<img title="\\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\\dpi{50}&amp;space;\\fn_phv&amp;space;\\frac{3}{8}" alt="" />'
>>> pattern = r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>'
>>> re.findall( pattern, a)
[('3', '8')]

这提取了分数的数字。现在,这适用于查找字符串

>>> pattern = r'<img\b[^<]*(?<=title=")\\frac\{\d+\}\{\d+\}"[^<]*>'
>>> re.findall( pattern, a)
['<img title="\\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\\dpi{50}&amp;space;\\fn_phv&amp;space;\\frac{3}{8}" alt="" />']

另外,更改替换字符串以使您的方法有效。

>>> sub = "<sup>1</sup>&frasl;<sub>2</sub>"
>>> re.sub(pattern, sub, a)
'<sup>3</sup>&frasl;<sub>8</sub>'

答案 1 :(得分:0)

由于两个原因,您r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>'的方法失败了:

  1. 要匹配\,模式必须包含转义\,i。即数据行部分\\frac与模式r'\\\\frac'匹配。

  2. 与您在顶部写的字符串不同(title="\frac{3}{8}")不同,您在问题底部提供的数据行为title="\\small \\frac{1}{5}" - 您没有考虑到\\small也在模式中。

  3. 将其纳入您的模式会产生

        r'<img\b[^<]*(?<=title=")(?:\\\\small )?\\\\frac\{(\d+)}\{(\d+)}"[^<]*>'
    

    并匹配您的数据。