Question

我有以下格式的字符串列表：

d = ['0.04M sodium propionate', ' 0.02M sodium cacodylate', ' 0.04M bis-tris propane', ' pH 8.0 ']

我想删除x.xxM，但保留pH后面的数字。我尝试了以下方法：

import re
for i in range(len(d)):
    d[i] = d[i].translate(None,'[1-9]+\.*[0-9]*M')

产生以下内容：

>>> d
['4 sodium propionate', ' 2 sodium cacodylate', ' 4 bistris propane', ' pH 8 ']

同时从.0中移除pH。我认为translate()没有考虑到顺序，对吧？另外，我不明白为什么4，2等仍然存在于其中一个元素中。我怎么能严格以[1-9]+\.*[0-9]*M的形式删除字符串（意思是应该有一个数字，后跟一个.和零个或多个数字，以及M）？

编辑：我知道使用正则表达式不能与translate()一起使用。它与0，.和M匹配，并将其删除。我想我可以尝试re.search()，找到确切的字符串，然后sub()。

Answer 1

我认为你的正则表达式几乎是正确的，只是你应该使用re.sub代替：

import re
for i in range(len(d)):
    d[i] = re.sub(r'[0-9]+\.[0-9]*M *', '', d[i])

ideone demo

以便d成为：

['sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8.0 ']

我对你的正则表达式进行了最小的修改，但这是每个部分的含义：

[0-9]+   # Match at least 1 number (a number between 0 to 9 inclusive)
\.       # Match a literal dot
[0-9]*   # Match 0 or more numbers (0 through 9 inclusive)
M *      # Match the character 'M' and any spaces following it

Answer 2

为什么要使用re.search然后再使用re.sub？你只需要re.sub。你也想做两件完全不同的事情，所以将它们分成两部分是有意义的。

if

请注意，我使用In [8]: d = ['0.04M sodium propionate', ' 0.02M sodium cacodylate', ' 0.04M bis-tris propane', ' pH 8.0 '] In [9]: d1 = [ re.sub(r"\d\.\d\dM", "",x) for x in d ] In [10]: d1 Out[10]: [' sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8.0 '] In [11]: d2 = [ re.sub(r"pH (\d+)\.\d+",r"pH \1", x) for x in d1 ] In [12]: d2 Out[12]: [' sodium propionate', ' sodium cacodylate', ' bis-tris propane', ' pH 8 ']，这是任何数字的简写。

Answer 3

Cnosider re.sub：

re.sub（pattern，repl，string，count = 0，flags = 0）

返回字符串   通过替换最左边的非重叠事件获得   替换代表中的字符串模式。

在你的情况下：

Scanner keyboard = new Scanner(System.in);
int flag = 0;
ArrayList<Integer> intArray = new ArrayList<>();

do
{
    System.out.print("Enter a positive integer or '-1' to quit:" );
    int input = keyboard.nextInt();
    intArray.add(input);

} while (flag != -1);

product(intArray); 
}

public static int product (Integer... numbers) 
{
    int total = 0;

    for (Integer element : numbers)
        total *= element;

    return total;
}

Answer 4

如何快速而肮脏

[re.sub(r'\b[.\d]+M\b', '', a).strip() for a in d]

给出了

['sodium propionate', 'sodium cacodylate', 'bis-tris propane', 'pH 8.0']

其中[.\d]+匹配任何连续的数字和点序列，M表示臼齿。两个\b确保它是一个单词，并strip()来删除多余的空格！

Answer 5

这是过滤掉x.xxM的正则表达式模式：

[\d|.]+M

表示数字（\ d）或（|）点（。）的字符串出现超过0次（+），以M（M）结尾。

以下是代码：

result = [re.sub(r'[\d|.]+M',r'',i) for i in d]
# re.sub(A,B,Str) replaces all A with B in Str.

产生这个结果：

[' sodium propionate', '  sodium cacodylate', '  bis-tris propane', ' pH 8.0 ']

正则表达式和python删除某种格式的字符串

5 个答案: