现在,我已经实施了几个@
来指定每个条件,一旦指定了案例:
@
( # str2_tokens is the tokenized sentence
for i in range(len(str2_tokens)):
if "@" in str2_tokens[i] and "@" in str2_tokens[i+1] and "@" in str2_tokens[i+2]:
str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\
str2_tokens[i+2].strip("@") + str2_tokens[i+3].strip("@")
str2_tokens[i+1] = str2_tokens[i]
str2_tokens[i+2] = str2_tokens[i]
str2_tokens[i+3] = str2_tokens[i]
if "@" in str2_tokens[i] and "@" in str2_tokens[i+1]:
str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@") +\
str2_tokens[i+2].strip("@")
str2_tokens[i+1] = str2_tokens[i]
str2_tokens[i+2] = str2_tokens[i]
if "@" in str2_tokens[i]:
str2_tokens[i] = str2_tokens[i].strip("@") + str2_tokens[i+1].strip("@")
str2_tokens[i+1] = str2_tokens[i]
仅出现在令牌的末尾)但正如您所看到的(从代码中),它是非常重复的,是否有人可以建议更简洁的方式来呈现代码?
代码快照:
paper and board — determination of the ink absorb@@ ency
被修改
例如:
案例1:输入为paper and board — determination of the ink absorbency absorbency
,并希望获得absorbency
的输出,related substance in f@@ ti@@ bam@@ zone can be determined with this method
重复两次,因为两个令牌已合并。
案例2:输入为related substance in ftibamzone ftibamzone ftibamzone ftibamzone can be determined with this method
,并希望获得ftibamzone
的输出,@
重复4次,因为已经合并了4个令牌。
{{1}}的代币数可以是任意代码。
答案 0 :(得分:0)
好吧,这是重复的,但应该这样做
result = ''
c = 0
for i in str2_tokens.split():
if '@' in i:
c+=1
result += ''.join(i.split('@'))
else:
result += (i+' ')
result += (result.split(' ')[-2]+' ')*c
c=0
result = result[:-1]
<强>结果强>
ftibamzone ftibamzone ftibamzone ftibamzone中的相关物质可以用这种方法测定答案 1 :(得分:0)
您可以通过此列表理解获得所需的内容并加入:
"".join([token.strip("@") for token in str2_tokens])
E.g。
>>> x = ["@a", "b@", "c"]
>>> "".join([y.strip("@") for y in x])
'abc'