Python-删除字符然后加入字符串

时间:2013-07-07 18:39:11

标签: python text-processing

我正在编写一个程序,将标准SVG路径转换为Raphael.js友好格式。

路径数据的格式为

d="M 62.678745,
   259.31235 L 63.560745,
   258.43135 L 64.220745,
   257.99135 L 64.439745,
   258.43135 L 64.000745
     ...
     ...
   "

我想要做的是首先删除小数位,然后删除空格。最终结果应采用

格式
d="M62,
   259L63,
   258L64,
   257L64,
   258L64
     ...
     ...
   "

我有大约2000个左右的路径要解析并转换为JSON文件。

到目前为止我所做的是

from bs4 import BeautifulSoup

svg = open("/path/to/file.svg", "r").read()
soup = BeautifulSoup(svg)
paths = soup.findAll("path")

raphael = []

for p in paths:
  splitData = p['d'].split(",")
  tempList = []

    for s in splitData:
      #strip decimals from string
      #don't know how to do this

      #remove whitespace
      s.replace(" ", "")

      #add to templist
      tempList.append(s + ", ")

    tempList[-1].replace(", ", "")
    raphael.append(tempList)

3 个答案:

答案 0 :(得分:3)

您可以使用regex

>>> import re
>>> d="""M 62.678745,
   259.31235 L 63.560745,
   258.43135 L 64.220745,
   257.99135 L 64.439745,
   258.43135 L 64.000745"""

for strs in d.splitlines():
    print  re.sub(r'(\s+)|(\.\d+)','',strs)
...     
M62,
259L63,
258L64,
257L64,
258L64

答案 1 :(得分:1)

试试这个:

import re
from bs4 import BeautifulSoup

svg = open("/path/to/file.svg", "r").read()
soup = BeautifulSoup(svg)
paths = soup.findAll("path")

raphael = []

for p in paths:
    splitData = p['d'].split(",")
    for line in splitData:
        # Remove ".000000" part
        line = re.sub("\.\d*", "", line)
        line = line.replace(" ", "")
        raphael.append(line)

d = ",\n".join(raphael)

答案 2 :(得分:1)

您可以构建一个强力解析器:

def isint(x):
    try:
        int(float(x))
        return True
    except:
        return False

def parser(s):
    mystr = lambda x: str(int(float(x)))
    s = s.replace('\n','##')
    tmp = ','.join( [''.join([mystr(x) if isint(x) else x \
                         for x in j.split()]) \
                         for j in s.split(',')] )
    return tmp.replace('##', '\n')

测试:

d="M 62.678745,\n 259.31235 L 63.560745,\n 258.43135 L 64.220745, \n 257.99135 L 64.439745, \n 258.43135 L 64.000745 "
print parser(d)
# M62,
# 259L63,
# 258L64,
# 257L64,
# 258L64