如何忽略文本中的指定单词?

时间:2017-10-03 07:02:16

标签: google-translate

我在app引擎中使用了java版本的translate API。 有没有办法忽略翻译中的特定单词,例如: “翻译IGNORED_TEXT这个”,对于某些语言,IGNORED_TEXT格式不正确,并且无法保证Translate API不会更改它。

3 个答案:

答案 0 :(得分:0)

经过多次尝试后,我终于得到了一种反复的东西,它使用了我想忽略的文字的特殊字符。在我的例子中,它们是字符串参数(%d,%s等)。也许这会对某人有所帮助:

public class Parser {

public static final String[] MAGIC_PARAMETER_STRING = {"975313579", "*****", "˨", "இ", "⏲"};
public static final String[] MAGIC_PARAMETER_NUMBER = {"975323579", "*******", "Ω", "˧", "\u23FA"};
private static final String formatSpecifier
        = "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";
private static final Pattern formatToken = Pattern.compile(formatSpecifier);
private final int maxStringParameterCount = Parser.MAGIC_PARAMETER_STRING.length;
private final int maxNumberParameterCount = Parser.MAGIC_PARAMETER_NUMBER.length;
private int stringPos = 0;
private int numberPos = 0;

private String convertToken(ConvertedString result, String index, String flags, String width, String precision, String temporal, String conversion, String numberReplacement, String stringReplacement) {
    if (conversion.equals("s")) {
        result.stringArgCount++;
        return stringReplacement;
    } else if (conversion.equals("d")) {
        result.numberArgCount++;
        return numberReplacement;
    }
    throw new IllegalArgumentException("%" + index + flags + width + precision + temporal + conversion);
}

private String getReplacementNumber(boolean bumpUp) throws RetryExceededException {
    if (bumpUp) {
        ++numberPos;
    }
    if (numberPos >= maxNumberParameterCount) {
        throw new RetryExceededException();
    }
    return MAGIC_PARAMETER_NUMBER[numberPos];
}

private String getReplacementString(boolean bumpUp) throws RetryExceededException {
    if (bumpUp) {
        ++stringPos;
    }
    if (stringPos >= maxStringParameterCount) {
        throw new RetryExceededException();
    }
    return MAGIC_PARAMETER_STRING[stringPos];
}

public ConvertedString revert(String text) throws RetryExceededException {
    ConvertedString convertedString = new ConvertedString();
    String replacementString = getReplacementString(false);
    String replacementNumber = getReplacementNumber(false);
    convertedString.stringArgCount = StringUtils.countMatches(text, replacementString);
    convertedString.numberArgCount = StringUtils.countMatches(text, replacementNumber);
    String result = text.replace(replacementString, "%s");
    result = result.replace(replacementNumber, "%d");
    convertedString.result = result;
    return convertedString;
}

public ConvertedString convert(final String format) {
    return convert(format, MAGIC_PARAMETER_NUMBER[0], MAGIC_PARAMETER_STRING[0]);
}

public ConvertedString convert(final String format, String numberReplacement, String stringReplacement) {
    ConvertedString result = new ConvertedString();
    final StringBuilder regex = new StringBuilder();
    final Matcher matcher = formatToken.matcher(format);
    int lastIndex = 0;
    while (matcher.find()) {
        regex.append(format.substring(lastIndex, matcher.start()));
        regex.append(convertToken(result, matcher.group(1), matcher.group(2), matcher.group(3),
                matcher.group(4), matcher.group(5), matcher.group(6), numberReplacement, stringReplacement));
        lastIndex = matcher.end();
    }
    regex.append(format.substring(lastIndex, format.length()));
    result.result = regex.toString();
    return result;
}

public ConvertedString retryConvert(String originalText, boolean bumpUpString, boolean bumpUpNumber) throws RetryExceededException {
    String replacementNumber = getReplacementNumber(bumpUpNumber);
    String replacementString = getReplacementString(bumpUpString);
    return convert(originalText, replacementNumber, replacementString);
}

public static class ConvertedString {
    public int stringArgCount;
    public int numberArgCount;
    public String result;

}

public static class RetryExceededException extends Exception {

}
}

答案 1 :(得分:0)

谢谢@ViliusL,您的回答使我找到了问题的解决方案。我一直在努力将部分文本排除在翻译之外。目前我还没有找到任何提示(stackoverflow,google),所以我在这个主题中留下了答案。

就我而言,问题出在错误的 MIME 类型上。如果您使用 google cloud translate api(版本 2 或 3 - 哪个都没有关系),您必须设置 mime 类型“text/html”而不是“text/plain”。如果您有 text/plain mime 类型,谷歌将忽略部分 html 标签和 class="notranslate"。示例如下:

   TranslateTextResponse requestForTranslation() {
      try (TranslationServiceClient client = googleTranslationServiceProvider.getClient()) {
         return client.translateText(buildRequest());
      }
   }


    TranslateTextRequest buildRequest() {
        return TranslateTextRequest.newBuilder()
                .setParent("YOUR_PARENT")
                .setMimeType("text/html") // HERE should be text/html
                .setSourceLanguageCode("DE")
                .setTargetLanguageCode("EN")
                .addContents("<span class=\"notranslate\">etwas auf deutsch</span>")
                .build();
    }

参考:https://cloud.google.com/translate/docs/supported-formats

附注。 我注意到您可以使用标签“<>”来排除翻译,如下所示:

 .addContents("<etwas auf deutsch>")

答案 2 :(得分:-1)

解决方案#1 :将[Unit] Description=Airflow scheduler daemon After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service Wants=postgresql.service mysql.service redis.service rabbitmq-server.service [Service] EnvironmentFile=/etc/sysconfig/airflow User=airflow Group=airflow Type=simple ExecStart=/bin/airflow scheduler Restart=always RestartSec=5s [Install] WantedBy=multi-user.target 替换为IGNORED_TEXT

解决方案2 :将<span class="notranslate">IGNORED_TEXT</span>替换为其md5,翻译所有内容,然后将其替换回去。 (适用于%s,%1 $ s,abc)