我有一个包含多行的文件,如下所示:
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
我想用另一个号码替换1371078139195(在这种情况下)。 我想要替换的值总是在第一个逗号分隔的单词中,并且始终是该单词中的第二个下划线分隔值。 以下是我这样做的方式并且它有效,但这似乎不合时宜且笨拙。
>>> line="'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> l1=",".join(line.split(",")[1:])
>>> print l1
{'cf:rv': '0'}
>>> l2=line.split(",")[0]
>>> print l2
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442'
>>> print "_".join(l2.split('_')[:-2])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight
>>>
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442'
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1]) + "," + l1
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
>>>
是否有更简单的方法来替换(可能使用正则表达式)值?我无法想象这是最好的方式
我有几个答案,我必须强调它是第二个强调的价值。以下是有效的字符串:
line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"
在上面的例子中,字符串中有一个数字字符串,它不在第二个最后一个下划线之后。最后一部分可能是也可能不是所有数字(可能是+14155186442,也可能是14155186442)。对不起,我上面没有提到这一点。
A
答案 0 :(得分:4)
使用正则表达式:
m = re.match("([^,]*_)([+]?[0-9]+)(_.*)", s)
if m:
before = m.group(1)
number = m.group(2)
after = m.group(3)
s = before + new_number(number) + after
意思是
[^,]*_
=您想要多少个字符,但不是逗号,后跟下划线[+]?[0-9]+
=数字,可选地以+
_.*
=一个下划线,后跟任何这是有效的,因为regexp匹配默认为“贪婪”,因此[^,]*
实际上将使用所有下划线,在倒数第二个之前停止以使匹配成功。
例如,如果您需要代替倒数第二个下划线,则需要第三个最后一个表达式可以更改为
m = re.match("([^,]*_)([+]?[0-9]+)(_[^,]*_.*)", s)
因此要求在数字之后在逗号之前至少有两个下划线。
答案 1 :(得分:3)
非正则表达式解决方案:
>>> strs = " 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> first, sep, rest = strs.partition(',')
>>> lis = first.rsplit('_', 2)
>>> lis[1] = "1111111"
>>> "_".join(lis) + sep + rest
" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1111111_+14155186442', {'cf:rv': '0'}"
<强>功能:强>
def solve(strs, rep): first, sep, rest = strs.partition(',')
lis = first.rsplit('_', 2)
lis[1] = rep
return "_".join(lis) + sep + rest
...
>>> solve(" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}", "1111")
" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1111_+14155186442', {'cf:rv': '0'}"
>>> solve("'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}", "2222")
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_2222_14155186442', {'cf:rv': '0'}"
>>> solve("'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}", "2222")
"'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_2222_1371078139195', {'cf:rv': '0'}"
答案 2 :(得分:1)
喜欢这个吗?
>>> line = "'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> re.subn('_(\d+)_', '_mynewnumber_', line, count=1)
("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_mynewnumber_+14155186442', {'cf:rv': '0'}",
1)
答案 3 :(得分:0)
import re
r = re.compile('([^,]*_)(\d+)(?=_[^_,]+,)(_.*)')
for line in ("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"):
print line
print r.sub('\\1ABCDEFG\\3',line)
print r.sub('\g<1>1234567\\3',line)
结果
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
\g<1>
表示“第1组”。
请参阅doc:
除了描述的字符转义和反向引用 在上面,\ g将使用由名为group的组匹配的子字符串 name,由(?P ...)语法定义。 \ g使用了 相应的组号; \克LT 2 - ;因此相当于\ 2,但是 在诸如\ g&lt; 2&gt; 0的替换中不是模糊的。 \ 20会 解释为对第20组的引用,而不是对第2组的引用 后跟字面字符'0'。反向引用\ g&lt; 0&gt; 在RE匹配的整个子字符串中替换。
答案 4 :(得分:0)
不像正则表达式那样复杂,但在将来编码,理解,调试和更改相对简单。除了分隔符之外,它不会假设哪些字母构成“单词”。
def replace_term(line, replacement):
csep = line.split(',')
usep = csep[0].split('_')
return ','.join(['_'.join(usep[:-2] + [replacement] + usep[-1:])] + csep[1:])
lines = ["'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}",
"'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"]
for line in lines:
print replace_term(line, 'XXX')
输出:
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_XXX_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_XXX_14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_XXX_1371078139195', {'cf:rv': '0'}