Question

我有一个类似于this question的问题。但我还有一个问题。从下面的问题中获取相同的表格，我添加了几行。

A,B,C,D
RNA,lung cancer,15,biotin
RNA,lung cancer,15,biotin
RNA,breast cancer,15,biotin
RNA,breast cancer,15,biotin
RNA,lung cancer,15,biotin
65 y 4m,prostate cancer,biotin
m,lung cancer,biotin

使用另外三行引用相同的样本字典

rna,ribonucleic acid
rnd,radical neck dissection
rni,recommended nutrient intake
rnp,ribonucleoprotein
m,months
m,male
y,years

我想在逻辑上替换它，例如，一个数字后跟m（数字和字母之间有或没有空格＆＃39; m＆＃39;类似于＆＃39; y＆＃39; ;年）将是几个月，而后跟m或单个m的字符将是男性（而不是月份，因为月份中的m出现在字典中的第一个）。我希望我的最终输出是

A,B,C,D
ribonucleic acid,lung cancer,15,biotin
ribonucleic acid,lung cancer,15,biotin
ribonucleic acid,breast cancer,15,biotin
ribonucleic acid,breast cancer,15,biotin
ribonucleic acid,lung cancer,15,biotin
65 years 4months,prostate cancer,biotin
male,lung cancer,biotin

Answer 1

对于您要进行的每次替换，请定义模式和替换字符串。使模式捕获紧接在要替换的文本之前的文本。您可以在进行替换时使用该文本。像这样：

import re

month_pair = (re.compile('(\d\s*)m'), 'months')
year_pair = (re.compile('(\d\s*)y'), 'years')

def substitute(s, pairs):
  for (pattern, substitution) in pairs:
    match = pattern.search(s)
    if match:
      s = pattern.sub(match.group(1)+substitution, s)
  return s

pairs = [month_pair, year_pair]
print(substitute('65 y 4m', pairs))

替换字典中的缩写

1 个答案: