我在文本文件中有大量的字符串,我想在每个字符串周围添加反引号,如下所示。
文本文件包含很多行,例如:
{created_at:2014年7月7日,文章:土耳其政府有 为被禁止的库尔德工人党的武装分子的回归制定了路线图 拿起武器反对土耳其国家,以便开辟出一个 土耳其东南部的独立国家。}
我想在日期和文章内容周围插入反向引用,如下所示......
{created_at:“2014年7月7日”,文章:“土耳其政府有 为被禁止的库尔德工人党的武装分子的回归制定了路线图 拿起武器反对土耳其国家,以便开辟出一个 在土耳其东南部使用python中的索引方法分离状态。
但我得到的结果为{created_at : "July 07", 2014, article : "The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey}
..因此它将引号置于错误的位置。
这是我的代码:
f = open("textfile.txt", "r")
for item in f:
first_comma_pos = item.find(",")
print first_comma_pos
first_colon_pos = item.find(" : ")
print first_colon_pos
second_comma_pos = item.find(",", first_comma_pos)
second_colon_pos = item.find(" : ", second_comma_pos)
print second_colon_pos
item = (item[:first_colon_pos+3] +
'"' + item[first_colon_pos+3:second_comma_pos] + '"' +
item[second_comma_pos:second_colon_pos+3] +
'"' + item[second_colon_pos+3:-1] + '"\n')
print item
saveFile= open("result.txt", "a")
saveFile.write(item)
saveFile.write('\n')
saveFile.close()
答案 0 :(得分:2)
非常hacky但是
<强> fix_json.py 强>
import re,json
s = """{created_at : July 07, 2014, article : The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey.}"""
parts0 = s.split(":")
data = {}
for lhs,rhs in zip(parts0,parts0[1:]):
#: assume that the word directly preceding the ":" is the key
#: word defined by regex below
key = re.sub("[^a-zA-Z_]","",lhs.rsplit(",",1)[-1])
value = rhs.rsplit(",",1)[0]
data[key] = value
print json.dumps(data)
粗糙的文件会将文件的读/写文件留给您...以及根据您的示例对您的数据进行一些假设
答案 1 :(得分:2)
你很准确,但有两个缺点: -
"
超出了{
。因此过去常常被抛弃已编辑的代码
f = open("textfile.txt", "r")
for item in f:
first_comma_pos = item.find(",")
print item
print first_comma_pos
first_colon_pos = item.find(" : ")
print first_colon_pos
second_comma_pos = item.find(",", first_comma_pos+1) # Note change
second_colon_pos = item.find(" : ", second_comma_pos)
print second_colon_pos
item = (item[:first_colon_pos+3] +
'"' + item[first_colon_pos+3:second_comma_pos] + '"' +
item[second_comma_pos:second_colon_pos+3] +
'"' + item[second_colon_pos+3:-2] + '"}\n') # Note change
print item
saveFile= open("result.txt", "a")
saveFile.write(item)
saveFile.write('\n')
saveFile.close()
输出
{created_at:“2014年7月7日”,文章:“土耳其政府已经制定了被禁止的库尔德工人党武装分子的回归路线图,他们拿起武器反对土耳其国家,以便在中国开辟一个单独的州土耳其东南部。“}
答案 2 :(得分:2)
如果数据始终是该格式,您可以从右侧逐位进行标记,例如:
s = """{created_at : July 07, 2014, article : The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey.}"""
created_at, a_sep, article_text = s.strip('{}').rpartition('article :')
start, c_sep, created_date = created_at.rpartition('created_at :')
new_string = '{{{} "{}", {} "{}"}}'.format(
c_sep,
created_date.strip(' ,'),
a_sep,
article_text.strip()
)
# {created_at : "July 07, 2014", article : "The Turkish government has drawn a roadmap for the return of militants of the banned PKK, who took up arms against the Turkish state in order to carve out a separate state in southeastern Turkey."}