Question

我在文本文件中有大量的字符串，我想在每个字符串周围添加反引号，如下所示。

文本文件包含很多行，例如：

{created_at：2014年7月7日，文章：土耳其政府有为被禁止的库尔德工人党的武装分子的回归制定了路线图拿起武器反对土耳其国家，以便开辟出一个土耳其东南部的独立国家。}

我想在日期和文章内容周围插入反向引用，如下所示......

{created_at：“2014年7月7日”，文章：“土耳其政府有为被禁止的库尔德工人党的武装分子的回归制定了路线图拿起武器反对土耳其国家，以便开辟出一个在土耳其东南部使用python中的索引方法分离状态。

但我得到的结果为{created_at : "July 07", 2014, article : "The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey} ..因此它将引号置于错误的位置。

这是我的代码：

f = open("textfile.txt", "r")
for item in f:
    first_comma_pos = item.find(",")
    print first_comma_pos
    first_colon_pos = item.find(" : ")
    print first_colon_pos
    second_comma_pos = item.find(",", first_comma_pos)
    second_colon_pos = item.find(" : ", second_comma_pos)
    print second_colon_pos
    item = (item[:first_colon_pos+3] + 
        '"' + item[first_colon_pos+3:second_comma_pos] + '"' +
        item[second_comma_pos:second_colon_pos+3] +
        '"' + item[second_colon_pos+3:-1] + '"\n')
    print item
    saveFile= open("result.txt", "a")
    saveFile.write(item)
    saveFile.write('\n')
    saveFile.close()

Answer 1

非常hacky但是

<强> fix_json.py

import re,json
s = """{created_at : July 07, 2014, article : The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey.}"""
parts0 = s.split(":")
data = {}
for lhs,rhs in zip(parts0,parts0[1:]):
    #: assume that the word directly preceding the ":" is the key
    #: word defined by regex below
    key = re.sub("[^a-zA-Z_]","",lhs.rsplit(",",1)[-1]) 
    value = rhs.rsplit(",",1)[0]
    data[key] = value

print json.dumps(data)

粗糙的文件会将文件的读/写文件留给您...以及根据您的示例对您的数据进行一些假设

Answer 2

你很准确，但有两个缺点： -

您find用于查找第一个逗号本身的位置，因为您没有添加额外的索引
您的结尾"超出了{。因此过去常常被抛弃

已编辑的代码

f = open("textfile.txt", "r")
for item in f:
    first_comma_pos = item.find(",")
    print item
    print first_comma_pos
    first_colon_pos = item.find(" : ")
    print first_colon_pos
    second_comma_pos = item.find(",", first_comma_pos+1)  # Note change
    second_colon_pos = item.find(" : ", second_comma_pos)
    print second_colon_pos
    item = (item[:first_colon_pos+3] + 
        '"' + item[first_colon_pos+3:second_comma_pos] + '"' +
        item[second_comma_pos:second_colon_pos+3] +
        '"' + item[second_colon_pos+3:-2] + '"}\n')  # Note change
    print item
    saveFile= open("result.txt", "a")
    saveFile.write(item)
    saveFile.write('\n')
    saveFile.close()

输出

{created_at：“2014年7月7日”，文章：“土耳其政府已经制定了被禁止的库尔德工人党武装分子的回归路线图，他们拿起武器反对土耳其国家，以便在中国开辟一个单独的州土耳其东南部。“}

Answer 3

如果数据始终是该格式，您可以从右侧逐位进行标记，例如：

s = """{created_at : July 07, 2014, article : The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey.}"""

created_at, a_sep, article_text = s.strip('{}').rpartition('article :')
start, c_sep, created_date = created_at.rpartition('created_at :')
new_string = '{{{} "{}", {} "{}"}}'.format(
    c_sep,
    created_date.strip(' ,'),
    a_sep,
    article_text.strip()
)

# {created_at : "July 07, 2014", article : "The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey."}

使用index，python将反引号放在字符串周围

3 个答案: