Question

我要求在开始时在速率符号@处构建一个正三元否定的正则表达式，然后是1到12位数之间的变化长度的数字，并以三个@符号结尾。应该选择除此之外的任何东西。

基本上我的挑战是我有一个数据框，其中包含文本语料库和模式@@@0-9@@@中的值我想删除除此模式之外的所有内容。我已经能够将正则表达式发展为[@][@][@]\d{1,12}[@][@][@]但是我希望否定这种模式，因为我想要找到并替换它。例如

my name is x and i work at @@@12354@@@ and i am happy with my job. what is your company name? is it @@@42334@@@? you look happy as well!!

应该返回@@@12354@@@ @@@42334@@@，因此在各个元素之间有一个空格分隔符很好。有什么帮助吗？

我将在python pandas dataframe uisng str.replace函数中使用此正则表达式。

我已经尝试了regexr.com和regex101.com并且已经来到这里

**编辑：**下面是数据

SNo details
1   account @@@0000082569@@@ / department stores uk & ie credit control operations
2   academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator
3   account executive, financial @@@0000060910@@@ , enterprise and partner group
4   2015-nasa summer internship- space power system @@@0000129849@@@ and testing
5   account technical @@@0000185187@@@ , technical presales, systems engineer
6   account @@@0000082569@@@ for car, van & 4x4 products in the east of england
7   account @@@0000082569@@@ for mikro segment and owners of the enterprises
8   account @@@0000082569@@@ - affinity digital display, mobile & publishing
9   account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate
10  account @@@0000082569@@@ for companies department of external relevance

Answer 1

而不是replace具有复杂的正则表达式，您可以将join与findall一起使用，并使用更简单的正则表达式：

>>> str = 'my name is x and i work at @@@12354@@@ and i am happy with my job. what is your company name? is it @@@42334@@@? you look happy as well!!'
>>> ' '.join(re.findall(r'@{3}\d{1,12}@{3}', str))
'@@@12354@@@ @@@42334@@@'

Answer 2

以下是我在here中的意思：

>>> df = pd.DataFrame({'col1':['at @@@12354@@@ and i am happy with my job. what is your company name? is it @@@42334@@@? you look happy as well!!', 'at @@@222@@@ and t @@@888888@@@?' ]})
>>> df['col1'].str.findall(r'@{3}\d+@{3}').apply(' '.join)
0    @@@12354@@@ @@@42334@@@
1     @@@222@@@ @@@888888@@@

@{3}\d+@{3}将匹配3个@符号附带的任意1位数字，.findall将提取所有匹配项。 .apply(' '.join)将使用空格加入值。

正则表达式否定三个@后跟数字和三个@结尾

2 个答案: