Question

现在，我已经实施了几个@来指定每个条件，一旦指定了案例：

strip @（# str2_tokens is the tokenized sentence for i in range(len(str2_tokens)): if "@" in str2_tokens[i] and "@" in str2_tokens[i+1] and "@" in str2_tokens[i+2]: str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\ str2_tokens[i+2].strip("@") + str2_tokens[i+3].strip("@") str2_tokens[i+1] = str2_tokens[i] str2_tokens[i+2] = str2_tokens[i] str2_tokens[i+3] = str2_tokens[i] if "@" in str2_tokens[i] and "@" in str2_tokens[i+1]: str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\ str2_tokens[i+2].strip("@") str2_tokens[i+1] = str2_tokens[i] str2_tokens[i+2] = str2_tokens[i] if "@" in str2_tokens[i]: str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") str2_tokens[i+1] = str2_tokens[i]仅出现在令牌的末尾）
加入以下字词
使用新创建的令牌替换后续令牌

但正如您所看到的（从代码中），它是非常重复的，是否有人可以建议更简洁的方式来呈现代码？

代码快照：

paper and board â€” determination of the ink absorb@@ ency

被修改

例如：

案例1：输入为paper and board â€” determination of the ink absorbency absorbency，并希望获得absorbency的输出，related substance in f@@ ti@@ bam@@ zone can be determined with this method重复两次，因为两个令牌已合并。

案例2：输入为related substance in ftibamzone ftibamzone ftibamzone ftibamzone can be determined with this method，并希望获得ftibamzone的输出，@重复4次，因为已经合并了4个令牌。

{{1}}的代币数可以是任意代码。

Answer 1

好吧，这是重复的，但应该这样做

   result = ''
   c = 0
   for i in str2_tokens.split():
       if '@' in i:
           c+=1
           result += ''.join(i.split('@'))
       else:
           result += (i+' ')
           result += (result.split(' ')[-2]+' ')*c
           c=0
   result = result[:-1]

<强>结果

ftibamzone ftibamzone ftibamzone ftibamzone中的相关物质可以用这种方法测定

Answer 2

您可以通过此列表理解获得所需的内容并加入：

"".join([token.strip("@") for token in str2_tokens])

E.g。

>>> x = ["@a", "b@", "c"]
>>> "".join([y.strip("@") for y in x])
'abc'

如果令牌包含＆＃34; @＆＃34;加入有争议的令牌。烧焦

2 个答案: