如何删除可变长度的字符串的一部分

时间:2019-04-10 12:47:19

标签: python regex pandas

我有一个DataFrame,其中一列是看起来像这样的字符串行:

Received value 126;AOC;H3498XX from 602
Received value 101;KYL;0IMMM0432 from 229

我想在第二个分号之后删除(或不替换任何内容)部分,使其看起来像

Received value 126;AOC; from 602

但是我要删除的这部分将具有变化且不可预测的长度(始终是A-Z和0-9的组合)。分号和froms将始终存在以供参考。

我正在通过研究以下链接来尝试使用正则表达式:https://docs.python.org/3/library/re.html

import re
for row in df[‘column’]:
    row = re.sub(‘;[A-Z0-9] from’ , ‘; from’, row)

我认为[A-Z0-9]无法合并我想要的变长方面。

2 个答案:

答案 0 :(得分:2)

str.replace()str.split()结合使用的示例:

s = ['126;AOC;H3498XX from 602', '101;KYL;0IMMM0432 from 229']

for elem in s:
    print(elem.replace(elem.split(";",2)[-1].split()[0],''))

输出

126;AOC; from 602
101;KYL; from 229

编辑

同样适用于以下示例:

s = ['Received value 126;AOC;H3498XX from 602', 'Received value 101;KYL;0IMMM0432 from 229']

for elem in s:
    print(elem.replace(elem.split(";",2)[-1].split()[0],''))

输出

Received value 126;AOC; from 602
Received value 101;KYL; from 229

答案 1 :(得分:1)

使用模式(Received value \d+;[A-Z]+;)\w+(\s.*?)

例如:

import re

s = ["Received value 126;AOC;H3498XX from 602", "Received value 101;KYL;0IMMM0432 from 229"]

for i in s:
    print( re.sub(r"(Received value \d+;[A-Z]+;)\w+(\s.*?)", r"\1", i) )

输出:

Received value 126;AOC;from 602
Received value 101;KYL;from 229