我想在字符串中的日期和文本周围插入引号(""
)(在文件input.txt
中)。这是我的输入文件:
created_at : October 9, article : ISTANBUL — Turkey is playing a risky game of chicken in its negotiations with NATO partners who want it to join combat operations against the Islamic State group — and it’s blowing back with violence in Turkish cities. As the Islamic militants rampage through Kurdish-held Syrian territory on Turkey’s border, Turkey says it won’t join the fight unless the U.S.-led coalition also goes after the government of Syrian President Bashar Assad.
created_at : October 9, article : President Obama chairs a special meeting of the U.N. Security Council last month. (Timothy A. Clary/AFP/Getty Images) When it comes to President Obama’s domestic agenda and his maneuvers to (try to) get things done, I get it. I understand what he’s up to, what he’s trying to accomplish, his ultimate endgame. But when it comes to his foreign policy, I have to admit to sometimes thinking “whut?” and agreeing with my colleague Ed Rogers’s assessment on the spate of books criticizing Obama’s foreign policy stewardship.
我想在日期和文字周围加上引号如下:
created_at : "October 9", article : "ISTANBUL — Turkey is playing a risky game of chicken in its negotiations with NATO partners who want it to join combat operations against the Islamic State group — and it’s blowing back with violence in Turkish cities. As the Islamic militants rampage through Kurdish-held Syrian territory on Turkey’s border, Turkey says it won’t join the fight unless the U.S.-led coalition also goes after the government of Syrian President Bashar Assad".
created_at : "October 9", article : "President Obama chairs a special meeting of the U.N. Security Council last month. (Timothy A. Clary/AFP/Getty Images) When it comes to President Obama’s domestic agenda and his maneuvers to (try to) get things done, I get it. I understand what he’s up to, what he’s trying to accomplish, his ultimate endgame. But when it comes to his foreign policy, I have to admit to sometimes thinking “whut?” and agreeing with my colleague Ed Rogers’s assessment on the spate of books criticizing Obama’s foreign policy stewardship".
这是我的代码,它找到逗号的索引(日期之后的,
)和文章的索引,然后使用这些,我想在日期周围插入引号。另外我想在文本周围插入引号,但是如何做到这一点?
f = open("input.txt", "r")
for line in f:
article_pos = line.find("article")
print article_pos
comma_pos = line.find(",")
print comma_pos
答案 0 :(得分:1)
虽然可以使用find
这样的低级操作和切片来执行此操作,但这并不是简单或惯用的方法。
首先,我将告诉你如何按照自己的方式去做:
comma_pos = line.find(", ")
first_colon_pos = line.find(" : ")
second_colon_pos = line.find(" : ", comma_pos)
line = (line[:first_colon_pos+3] +
'"' + line[first_colon_pos+3:comma_pos] + '"' +
line[comma_pos:second_colon_pos+3] +
'"' + line[second_colon_pos+3:] + '"')
但你可以更容易地将线分成比特,将这些比特混合在一起,并将它们重新组合在一起:
dateline, article = line.split(', ', 1)
key, value = dateline.split(' : ')
dateline = '{} : "{}"'.format(key, value)
key, value = article.split(' : ')
article = '{} : "{}"'.format(key, value)
line = '{}, {}'.format(dateline, article)
然后你可以把重复的部分重构成一个简单的函数,这样你就不必两次写同样的东西(如果你以后需要写四次就可以派上用场)。 / p>
使用正则表达式更容易,但对于新手来说可能不那么容易理解:
line = re.sub(r'(.*?:\s*)(.*?)(\s*,.*?:\s*)(.*)', r'\1"\2"\3"\4"', line)
这可以通过捕获一个组中的第一个:
(以及它后面的任何空格)的所有内容,然后从第二组到第一组中的第一个逗号的所有内容,依此类推:
(.*?:\s*)(.*?)(\s*,.*?:\s*)(.*)
请注意,正则表达式的优势在于我可以说"之后的任何空格"非常简单,在使用find
或split
时,我必须明确指定冒号两侧只有一个空格,而逗号之后只有一个空格,因为搜索" 0或更多空格& #34;没有某种方式表达它就像\s*
。
答案 1 :(得分:0)
您还可以查看正则表达式库re
。
E.g。
>>> import re
>>> print(re.sub(r'created_at:\s(.*), article:\s(.*)',
... r'created_at: "\1", article: "\2"',
... 'created_at: October 9, article: ...'))
created_at: "October 9", article: "..."
re.sub
的第一个参数是您要匹配的模式。 parens ()
捕获匹配项,可以在\1
的第二个参数中使用。第三个参数是文本行。