Question

我有一个包含多行的文件，如下所示：

 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}

我想用另一个号码替换1371078139195（在这种情况下）。我想要替换的值总是在第一个逗号分隔的单词中，并且始终是该单词中的第二个下划线分隔值。以下是我这样做的方式并且它有效，但这似乎不合时宜且笨拙。

>>> line="'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> l1=",".join(line.split(",")[1:])
>>> print l1
 {'cf:rv': '0'}
>>> l2=line.split(",")[0]
>>> print l2
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442'
>>> print "_".join(l2.split('_')[:-2])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight
>>>
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1])
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442'
>>> print "_".join(l2.split('_')[:-2])+ "_1234567_"+(l2.split('_')[-1]) + "," + l1
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}
>>>

是否有更简单的方法来替换（可能使用正则表达式）值？我无法想象这是最好的方式

我有几个答案，我必须强调它是第二个强调的价值。以下是有效的字符串：

line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}"
line = "'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"

在上面的例子中，字符串中有一个数字字符串，它不在第二个最后一个下划线之后。最后一部分可能是也可能不是所有数字（可能是+14155186442，也可能是14155186442）。对不起，我上面没有提到这一点。

A

Answer 1

使用正则表达式：

m = re.match("([^,]*_)([+]?[0-9]+)(_.*)", s)
if m:
    before = m.group(1)
    number = m.group(2)
    after = m.group(3)
    s = before + new_number(number) + after

意思是

[^,]*_ =您想要多少个字符，但不是逗号，后跟下划线
[+]?[0-9]+ =数字，可选地以+
_.* =一个下划线，后跟任何

这是有效的，因为regexp匹配默认为“贪婪”，因此[^,]*实际上将使用所有下划线，在倒数第二个之前停止以使匹配成功。

例如，如果您需要代替倒数第二个下划线，则需要第三个最后一个表达式可以更改为

m = re.match("([^,]*_)([+]?[0-9]+)(_[^,]*_.*)", s)

因此要求在数字之后在逗号之前至少有两个下划线。

Answer 2

非正则表达式解决方案：

>>> strs = " 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> first, sep, rest = strs.partition(',')
>>> lis = first.rsplit('_', 2)
>>> lis[1] = "1111111"
>>> "_".join(lis) + sep + rest
" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1111111_+14155186442', {'cf:rv': '0'}"

<强>功能：

def solve(strs, rep):                                                                                                   first, sep, rest = strs.partition(',')
    lis = first.rsplit('_', 2)
    lis[1] = rep
    return "_".join(lis) + sep + rest
... 
>>> solve(" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}", "1111")
" 'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1111_+14155186442', {'cf:rv': '0'}"
>>> solve("'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}", "2222")
"'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_2222_14155186442', {'cf:rv': '0'}"
>>> solve("'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}", "2222")
"'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_2222_1371078139195', {'cf:rv': '0'}"

Answer 3

喜欢这个吗？

>>> line = "'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"
>>> re.subn('_(\d+)_', '_mynewnumber_', line, count=1) 
("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_mynewnumber_+14155186442', {'cf:rv': '0'}",
1)

Answer 4

import re

r = re.compile('([^,]*_)(\d+)(?=_[^_,]+,)(_.*)')

for line in ("'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
             "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}"):
    print line
    print r.sub('\\1ABCDEFG\\3',line)
    print r.sub('\g<1>1234567\\3',line)

结果

'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}

'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_ABCDEFG_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1234567_+14155186442', {'cf:rv': '0'}

\g<1>表示“第1组”。请参阅doc：

除了描述的字符转义和反向引用在上面，\ g将使用由名为group的组匹配的子字符串 name，由（？P ...）语法定义。 \ g使用了相应的组号; \克LT 2 - ;因此相当于\ 2，但是在诸如\ g＆lt; 2＆gt; 0的替换中不是模糊的。 \ 20会解释为对第20组的引用，而不是对第2组的引用后跟字面字符'0'。反向引用\ g＆lt; 0＆gt; 在RE匹配的整个子字符串中替换。

Answer 5

不像正则表达式那样复杂，但在将来编码，理解，调试和更改相对简单。除了分隔符之外，它不会假设哪些字母构成“单词”。

def replace_term(line, replacement):
    csep = line.split(',')
    usep = csep[0].split('_')
    return ','.join(['_'.join(usep[:-2] + [replacement] + usep[-1:])] + csep[1:])

lines = ["'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_+14155186442', {'cf:rv': '0'}",
         "'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_1371078139195_14155186442', {'cf:rv': '0'}",
         "'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_1371078139195_1371078139195', {'cf:rv': '0'}"]

for line in lines:
    print replace_term(line, 'XXX')

输出：

'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_XXX_+14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_23456_BigtittedBlondOtherNight_XXX_14155186442', {'cf:rv': '0'}
'AMS_Investigation|txtt.co_1371078139195_BigtittedBlondOtherNight_XXX_1371078139195', {'cf:rv': '0'}

在逗号分隔的字符串中间替换下划线分隔的子字符串

5 个答案: