在Python中删除换行的问题

时间:2014-03-28 15:32:19

标签: python newline

我目前从Crunchbase获取有关公司概况的信息。 API信息可从here获得。

通过简单的步骤,我想获取名称,永久链接,描述和概述,并将其插入到MySQL数据库中。 为此,我有以下代码:

url = "http://api.crunchbase.com/v/1/company/%s.js?api_key=<insert_api_key>" %  permalink

i = 1
TIME = 5
TRYS = 3
while True:
    try:
        fh = urllib2.urlopen(url)
        cont = fh.read()
        fh.close()
        data = json.loads(cont)

    except Exception as ex:
        print ex
        print "Sleep %d seconds to try again" % (TIME * i)
        time.sleep(TIME * i)
    i += 1
    if i > TRYS:
        INVALID.append(url)
        data = None

overview = data.get("overview")
overview = strip_tags(overview).replace('\n','')
sql_data = {
    "name": data.get("name").replace('"', "'"),
    "permalink": data.get("permalink", ""),
    "description": data.get("description","").replace('\n',''),
    "overview": overview
}

keys = sql_data.keys()
#print keys
sql = """insert into %s(%s) values (""" % (TABLE, "`".join(keys))

for index, k in enumerate(keys):
    if index < len(keys)-1:
        sql += '''"%s",''' % sql_data.get(k, "")
    else: sql += sql_data.get(k,'')
        sql += """)"""

请注意,我将在此代码的末尾添加strip_tags函数。

无论如何,我遇到了绊脚石。我试图通过使用\n删除新行.replace('\n',''),以便U在overviewdescription上执行此操作。我还尝试使用[\n]+删除所有换行符。但我仍然在每家公司都遇到错误。一个这样的错误是:

(1064, '[34816] syntax error: syntax error near "Management"\nLINE: ...agement     software.","adventnet","AdventNet",Server Management...\n                                                               ^')
3: downloading adventnet failed

打印时的公司概述是:

AdventNet现在是Zoho ManageEngine。

Founded in 1996, AdventNet has served a diverse range of enterprise IT, networking and telecom customers.

AdventNet supplies server and network management software.
insert into crunchbase_overview_company(overview`permalink`name`description) values     ("AdventNet is now Zoho ManageEngine.

 Founded in 1996, AdventNet has served a diverse range of enterprise IT, networking and     telecom customers.

即使显然做了一些应该剥掉它们的东西,这显然还有新的线条!

有没有人对如何处理这个问题有任何建议,提示和提示?

剥离代码功能:

from HTMLParser import HTMLParser

class MLStripper(HTMLParser):
def __init__(self):
    self.reset()
    self.fed = []
def handle_data(self, d):
    self.fed.append(d)
def get_data(self):
    return ''.join(self.fed)

def strip_tags(html):
s = MLStripper()
s.feed(html)
return s.get_data()

1 个答案:

答案 0 :(得分:0)

你也可以尝试更换回车吗?

overview = strip_tags(overview).replace('\n','').replace('\r','')

Windows通常会添加回车符(\ r)而不是换行符(\ n)。