我在app引擎中使用了java版本的translate API。 有没有办法忽略翻译中的特定单词,例如: “翻译IGNORED_TEXT这个”,对于某些语言,IGNORED_TEXT格式不正确,并且无法保证Translate API不会更改它。
答案 0 :(得分:0)
经过多次尝试后,我终于得到了一种反复的东西,它使用了我想忽略的文字的特殊字符。在我的例子中,它们是字符串参数(%d,%s等)。也许这会对某人有所帮助:
public class Parser {
public static final String[] MAGIC_PARAMETER_STRING = {"975313579", "*****", "˨", "இ", "⏲"};
public static final String[] MAGIC_PARAMETER_NUMBER = {"975323579", "*******", "Ω", "˧", "\u23FA"};
private static final String formatSpecifier
= "%(\\d+\\$)?([-#+ 0,(\\<]*)?(\\d+)?(\\.\\d+)?([tT])?([a-zA-Z%])";
private static final Pattern formatToken = Pattern.compile(formatSpecifier);
private final int maxStringParameterCount = Parser.MAGIC_PARAMETER_STRING.length;
private final int maxNumberParameterCount = Parser.MAGIC_PARAMETER_NUMBER.length;
private int stringPos = 0;
private int numberPos = 0;
private String convertToken(ConvertedString result, String index, String flags, String width, String precision, String temporal, String conversion, String numberReplacement, String stringReplacement) {
if (conversion.equals("s")) {
result.stringArgCount++;
return stringReplacement;
} else if (conversion.equals("d")) {
result.numberArgCount++;
return numberReplacement;
}
throw new IllegalArgumentException("%" + index + flags + width + precision + temporal + conversion);
}
private String getReplacementNumber(boolean bumpUp) throws RetryExceededException {
if (bumpUp) {
++numberPos;
}
if (numberPos >= maxNumberParameterCount) {
throw new RetryExceededException();
}
return MAGIC_PARAMETER_NUMBER[numberPos];
}
private String getReplacementString(boolean bumpUp) throws RetryExceededException {
if (bumpUp) {
++stringPos;
}
if (stringPos >= maxStringParameterCount) {
throw new RetryExceededException();
}
return MAGIC_PARAMETER_STRING[stringPos];
}
public ConvertedString revert(String text) throws RetryExceededException {
ConvertedString convertedString = new ConvertedString();
String replacementString = getReplacementString(false);
String replacementNumber = getReplacementNumber(false);
convertedString.stringArgCount = StringUtils.countMatches(text, replacementString);
convertedString.numberArgCount = StringUtils.countMatches(text, replacementNumber);
String result = text.replace(replacementString, "%s");
result = result.replace(replacementNumber, "%d");
convertedString.result = result;
return convertedString;
}
public ConvertedString convert(final String format) {
return convert(format, MAGIC_PARAMETER_NUMBER[0], MAGIC_PARAMETER_STRING[0]);
}
public ConvertedString convert(final String format, String numberReplacement, String stringReplacement) {
ConvertedString result = new ConvertedString();
final StringBuilder regex = new StringBuilder();
final Matcher matcher = formatToken.matcher(format);
int lastIndex = 0;
while (matcher.find()) {
regex.append(format.substring(lastIndex, matcher.start()));
regex.append(convertToken(result, matcher.group(1), matcher.group(2), matcher.group(3),
matcher.group(4), matcher.group(5), matcher.group(6), numberReplacement, stringReplacement));
lastIndex = matcher.end();
}
regex.append(format.substring(lastIndex, format.length()));
result.result = regex.toString();
return result;
}
public ConvertedString retryConvert(String originalText, boolean bumpUpString, boolean bumpUpNumber) throws RetryExceededException {
String replacementNumber = getReplacementNumber(bumpUpNumber);
String replacementString = getReplacementString(bumpUpString);
return convert(originalText, replacementNumber, replacementString);
}
public static class ConvertedString {
public int stringArgCount;
public int numberArgCount;
public String result;
}
public static class RetryExceededException extends Exception {
}
}
答案 1 :(得分:0)
谢谢@ViliusL,您的回答使我找到了问题的解决方案。我一直在努力将部分文本排除在翻译之外。目前我还没有找到任何提示(stackoverflow,google),所以我在这个主题中留下了答案。
就我而言,问题出在错误的 MIME 类型上。如果您使用 google cloud translate api(版本 2 或 3 - 哪个都没有关系),您必须设置 mime 类型“text/html”而不是“text/plain”。如果您有 text/plain mime 类型,谷歌将忽略部分 html 标签和 class="notranslate"。示例如下:
TranslateTextResponse requestForTranslation() {
try (TranslationServiceClient client = googleTranslationServiceProvider.getClient()) {
return client.translateText(buildRequest());
}
}
TranslateTextRequest buildRequest() {
return TranslateTextRequest.newBuilder()
.setParent("YOUR_PARENT")
.setMimeType("text/html") // HERE should be text/html
.setSourceLanguageCode("DE")
.setTargetLanguageCode("EN")
.addContents("<span class=\"notranslate\">etwas auf deutsch</span>")
.build();
}
参考:https://cloud.google.com/translate/docs/supported-formats
附注。 我注意到您可以使用标签“<>”来排除翻译,如下所示:
.addContents("<etwas auf deutsch>")
答案 2 :(得分:-1)
解决方案#1 :将[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/sysconfig/airflow
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
替换为IGNORED_TEXT
。
解决方案2 :将<span class="notranslate">IGNORED_TEXT</span>
替换为其md5,翻译所有内容,然后将其替换回去。 (适用于%s,%1 $ s,abc)