I am using Python to replace certain string in sql file that I have. The string looks this:
<img title="\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\dpi{50}&space;\fn_phv&space;\frac{3}{8}" alt="" />
Basically it contains a html code for a fraction. But now I would like to replace it with:
<sup>3</sup>⁄<sub>8</sub>
To replace it in the sql file, I am using this code in Python,
for line in filedata:
re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>⁄<sub>\g<2></sub>", line)
This doesn't change the data, so I have tried this as well.
filedata1 = re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>⁄<sub>\g<2></sub>", filedata)
This also didn't help me. Need some help on this.
My full code:
import re
with open('/Users/cnnlakshmen/Downloads/qz_question.sql', 'r') as fin:
filedata = fin.read()
for line in filedata:
re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>⁄<sub>\g<2></sub>", line)
filedata1 = re.sub(r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>', "<sup>\g<1></sup>⁄<sub>\g<2></sub>", filedata)
print filedata1
# Write the file out again
with open('/Users/cnnlakshmen/Downloads/qz_question1.sql', 'w') as fin:
fin.write(filedata1)
Each data line looks like this:
(163, 'S001', 'T005', 'ST015', 'Medium', '1', 9, '1', '<p>The ratio of the number of children to the number of adults at a funfair was 2 : 5. <sup>1</sup>⁄<sub>5</sub>of the children were boys. If there were 120 more adults than children, how many girls were there at the funfair?</p>\n<p> </p>', 'without_image', '[{"value":"16","answer":"0"},{"value":"40","answer":"0"},{"value":"64","answer":"1"},{"value":"120","answer":"0"}]', '<p>5 -2 = 3</p>\n<p>3 units --> 120</p>\n<p>1 unit --> 120 ÷ 3 = 40</p>\n<p>2 units --> 40 x 2 = 80</p>\n<p>1 - <img title="\\small \\frac{1}{5}" src="http://latex.codecogs.com/gif.latex?\\small&space;\\frac{1}{5}" alt="" width="5" height="20" /> = <img title="\\small \\frac{4}{5}" src="http://latex.codecogs.com/gif.latex?\\small&space;\\frac{4}{5}" alt="" width="4" height="16" /></p>\n<p><img title="\\small \\frac{4}{5}" src="http://latex.codecogs.com/gif.latex?\\small&space;\\frac{4}{5}" alt="" width="4" height="16" /> x 80 = 64</p>', 'lakshmen K', NULL, '1', '0', '2015-05-03 15:54:19', '0000-00-00 00:00:00'),
答案 0 :(得分:0)
您的正则表达式无法正常工作,因为您可能认为它正在运行。
>>> a = '<img title="\\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\\dpi{50}&space;\\fn_phv&space;\\frac{3}{8}" alt="" />'
>>> pattern = r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>'
>>> re.findall( pattern, a)
[('3', '8')]
这提取了分数的数字。现在,这适用于查找字符串
>>> pattern = r'<img\b[^<]*(?<=title=")\\frac\{\d+\}\{\d+\}"[^<]*>'
>>> re.findall( pattern, a)
['<img title="\\frac{3}{8}" src="http://latex.codecogs.com/gif.latex?\\dpi{50}&space;\\fn_phv&space;\\frac{3}{8}" alt="" />']
另外,更改替换字符串以使您的方法有效。
>>> sub = "<sup>1</sup>⁄<sub>2</sub>"
>>> re.sub(pattern, sub, a)
'<sup>3</sup>⁄<sub>8</sub>'
答案 1 :(得分:0)
由于两个原因,您r'<img\b[^<]*(?<=title=")\\frac\{(\d+)\}\{(\d+)\}"[^<]*>'
的方法失败了:
要匹配\
,模式必须包含转义\
,i。即数据行部分\\frac
与模式r'\\\\frac'
匹配。
与您在顶部写的字符串不同(title="\frac{3}{8}"
)不同,您在问题底部提供的数据行为title="\\small \\frac{1}{5}"
- 您没有考虑到\\small
也在模式中。
将其纳入您的模式会产生
r'<img\b[^<]*(?<=title=")(?:\\\\small )?\\\\frac\{(\d+)}\{(\d+)}"[^<]*>'
并匹配您的数据。