我希望以下列方式音译印地语文本到英语
印地语 - “आपकास्वागतहै”
到
英文 - “ aapka swagat hain ”
我不想使用Google的Translate API或任何其他翻译API。如果我使用它们,它最终会给我翻译版本的印地语文本“欢迎”。
我的Android代码中是否有可用的Transliterate库?
我听说过ICU,但在我的代码中找到使用它的程序却没有运气。
答案 0 :(得分:4)
解决问题的一种可能方法是将其分解为两种 可以解决的问题并将两者结合起来。
有印地语读者可以阅读印地语梵文脚本。
还有一些指令引擎用英语进行语音转录。
E.g。当有人在古吉拉特语的Vonage电话线上留言时,它会录制音频,生成英文文本并通过电子邮件向我发送wav文件和文本。请注意,在阅读短信时,它偶尔可能会非常有趣,因为Vonage认为它应该是英文的,我希望这封信是用英文写的,但在阅读完信息之后我意识到这是古吉拉特语。
Google'用于Android'的“印地语读者”和“拼音转录”获取更多信息。如果一个印地语读者可以输出一个wav文件,可以用作转录片的输入,那么它可能是你问题的解决方案。
答案 1 :(得分:1)
我发现的一种可能且可能简单的解决方案是字符映射。因此,我需要将印地文单词音译为英文字符,情况与您完全一样,但是您也可以使用这种方法来处理其他音译词。您只需要更改特定语言的unicode。
理论:-因此,将印地语或任何其他语言的字符转换为英文字符,反之亦然,首先,您需要了解要转换为英文字符的每种语言字符的Unicode范围。
例如:要将印地语Unicode转换为英语ASCII值,unicode的范围是0900-097F,并且该范围会随着语言的变化而变化。因此,通过使用此unicode,您可以映射“特定unicode(印地语字符)的声音”,例如ह将映射到英语字母中的h。所以,这是理论部分。
实用方法:我需要创建一个应用来获取用户的北印度语语音输入并将其转换为英文字母。因此,我使用了STT(语音到文本)库并获得了印地文unicode,然后从那印地文unicode中将其映射为英文字符。
代码:-
MainActivity.java
package android.example.com.conversion;
import android.content.ActivityNotFoundException;
import android.content.Intent;
import android.os.Bundle;
import android.speech.RecognizerIntent;
import android.support.v7.app.AppCompatActivity;
import android.support.v7.widget.AppCompatButton;
import android.view.View;
import android.widget.TextView;
import android.widget.Toast;
import java.util.ArrayList;
public class MainActivity extends AppCompatActivity implements View.OnClickListener {
// Record Button
AppCompatButton RecordBtn;
// TextView to show Original and recognized Text
TextView Original,result;
// Request Code for STT
private final int SST_REQUEST_CODE = 101;
// Conversion Table Object...
ConversionTable conversionTable;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
Original = findViewById(R.id.Original_Text);
RecordBtn = findViewById(R.id.RecordBtn);
result = findViewById(R.id.Recognized_Text);
RecordBtn.setOnClickListener(this);
}
@Override
public void onClick(View v) {
switch (v.getId()) {
case R.id.RecordBtn:
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
// Use Off line Recognition Engine only...
intent.putExtra(RecognizerIntent.EXTRA_PREFER_OFFLINE, false);
// Use Hindi Speech Recognition Model...
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "hi-IN");
try {
startActivityForResult(intent, SST_REQUEST_CODE);
} catch (ActivityNotFoundException a) {
Toast.makeText(getApplicationContext(),
getString(R.string.error),
Toast.LENGTH_SHORT).show();
}
break;
}
}
@Override
protected void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
switch (requestCode) {
case SST_REQUEST_CODE:
if (resultCode == RESULT_OK && null != data) {
ArrayList<String> getResult = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
Original.setText(getResult.get(0));
conversionTable = new ConversionTable();
String Transformed_String = conversionTable.transform(getResult.get(0));
result.setText(Transformed_String);
}
break;
}
}
}
我的ConversationTable.java类:这会将值映射为英文字母。
package android.example.com.conversion;
import android.util.Log;
import java.util.ArrayList;
import java.util.Hashtable;
public class ConversionTable
{
private String TAG = "Conversation Table";
private Hashtable<String,String> unicode;
private void populateHashTable()
{
unicode = new Hashtable<>();
// unicode
unicode.put("\u0901","rha"); // anunAsika - cchandra bindu, using ~ to // *
unicode.put("\u0902","n"); // anusvara
unicode.put("\u0903","ah"); // visarga
unicode.put("\u0940","ee");
unicode.put("\u0941","u");
unicode.put("\u0942","oo");
unicode.put("\u0943","rhi");
unicode.put("\u0944","rhee"); // * = Doubtful Case
unicode.put("\u0945","e");
unicode.put("\u0946","e");
unicode.put("\u0947","e");
unicode.put("\u0948","ai");
unicode.put("\u0949","o");
unicode.put("\u094a","o");
unicode.put("\u094b","o");
unicode.put("\u094c","au");
unicode.put("\u094d","");
unicode.put("\u0950","om");
unicode.put("\u0958","k");
unicode.put("\u0959","kh");
unicode.put("\u095a","gh");
unicode.put("\u095b","z");
unicode.put("\u095c","dh"); // *
unicode.put("\u095d","rh");
unicode.put("\u095e","f");
unicode.put("\u095f","y");
unicode.put("\u0960","ri");
unicode.put("\u0961","lri");
unicode.put("\u0962","lr"); // *
unicode.put("\u0963","lree"); // *
unicode.put("\u093E","aa");
unicode.put("\u093F","i");
// Vowels and Consonants...
unicode.put("\u0905","a");
unicode.put("\u0906","a");
unicode.put("\u0907","i");
unicode.put("\u0908","ee");
unicode.put("\u0909","u");
unicode.put("\u090a","oo");
unicode.put("\u090b","ri");
unicode.put("\u090c","lri"); // *
unicode.put("\u090d","e"); // *
unicode.put("\u090e","e"); // *
unicode.put("\u090f","e");
unicode.put("\u0910","ai");
unicode.put("\u0911","o");
unicode.put("\u0912","o");
unicode.put("\u0913","o");
unicode.put("\u0914","au");
unicode.put("\u0915","k");
unicode.put("\u0916","kh");
unicode.put("\u0917","g");
unicode.put("\u0918","gh");
unicode.put("\u0919","ng");
unicode.put("\u091a","ch");
unicode.put("\u091b","chh");
unicode.put("\u091c","j");
unicode.put("\u091d","jh");
unicode.put("\u091e","ny");
unicode.put("\u091f","t"); // Ta as in Tom
unicode.put("\u0920","th");
unicode.put("\u0921","d"); // Da as in David
unicode.put("\u0922","dh");
unicode.put("\u0923","n");
unicode.put("\u0924","t"); // ta as in tamasha
unicode.put("\u0925","th"); // tha as in thanks
unicode.put("\u0926","d"); // da as in darvaaza
unicode.put("\u0927","dh"); // dha as in dhanusha
unicode.put("\u0928","n");
unicode.put("\u0929","nn");
unicode.put("\u092a","p");
unicode.put("\u092b","ph");
unicode.put("\u092c","b");
unicode.put("\u092d","bh");
unicode.put("\u092e","m");
unicode.put("\u092f","y");
unicode.put("\u0930","r");
unicode.put("\u0931","rr");
unicode.put("\u0932","l");
unicode.put("\u0933","ll"); // the Marathi and Vedic 'L'
unicode.put("\u0934","lll"); // the Marathi and Vedic 'L'
unicode.put("\u0935","v");
unicode.put("\u0936","sh");
unicode.put("\u0937","ss");
unicode.put("\u0938","s");
unicode.put("\u0939","h");
// represent it\
// unicode.put("\u093c","'"); // avagraha using "'"
// unicode.put("\u093d","'"); // avagraha using "'"
unicode.put("\u0969","3"); // 3 equals to pluta
unicode.put("\u014F","Z");// Z equals to upadhamaniya
unicode.put("\u0CF1","V");// V equals to jihvamuliya....but what character have u settled for jihvamuliya
/* unicode.put("\u0950","Ω"); // aum
unicode.put("\u0958","κ"); // Urdu qaif
unicode.put("\u0959","Κ"); //Urdu qhe
unicode.put("\u095A","γ"); // Urdu gain
unicode.put("\u095B","ζ"); //Urdu zal, ze, zoe
unicode.put("\u095E","φ"); // Urdu f
unicode.put("\u095C","δ"); // Hindi 'dh' as in padh
unicode.put("\u095D","Δ"); // hindi dhh*/
unicode.put("\u0926\u093C","τ"); // Urdu dwad
unicode.put("\u0924\u093C","θ"); // Urdu toe
unicode.put("\u0938\u093C","σ"); // Urdu swad, se
}
ConversionTable()
{
populateHashTable();
}
public String transform(String s1)
{
StringBuilder transformed = new StringBuilder();
int strLen = s1.length();
ArrayList<String> shabda = new ArrayList<>();
String lastEntry = "";
for (int i = 0; i < strLen; i++)
{
char c = s1.charAt(i);
String varna = String.valueOf(c);
Log.d(TAG, "transform: " + varna + "\n");
String halant = "0x0951";
if (VowelUtil.isConsonant(varna))
{
Log.d(TAG, "transform: " + unicode.get(varna));
shabda.add(unicode.get(varna));
shabda.add(halant); //halant
lastEntry = halant;
}
else if (VowelUtil.isVowel(varna))
{
Log.d(TAG, "transform: " + "Vowel Detected...");
if (halant.equals(lastEntry))
{
if (varna.equals("a"))
{
shabda.set(shabda.size() - 1,"");
}
else
{
shabda.set(shabda.size() - 1, unicode.get(varna));
}
}
else
{
shabda.add(unicode.get(varna));
}
lastEntry = unicode.get(varna);
} // end of else if is-Vowel
else if (unicode.containsKey(varna))
{
shabda.add(unicode.get(varna));
lastEntry = unicode.get(varna);
}
else
{
shabda.add(varna);
lastEntry = varna;
}
} // end of for
for (String string: shabda)
{
transformed.append(string);
}
//Discard the shabda array
shabda = null;
return transformed.toString(); // return transformed;
}
}
我的VowelUtil.class:这将检查元音和辅音,尽管不需要检查。
package android.example.com.conversion;
public class VowelUtil {
protected static boolean isVowel(String strVowel) {
// Log.logInfo("came in is_Vowel: Checking whether string is a Vowel");
return strVowel.equals("a") || strVowel.equals("aa") || strVowel.equals("i") || strVowel.equals("ee") ||
strVowel.equals("u") || strVowel.equals("oo") || strVowel.equals("ri") || strVowel.equals("lri") || strVowel.equals("e")
|| strVowel.equals("ai") || strVowel.equals("o") || strVowel.equals("au") || strVowel.equals("om");
}
protected static boolean isConsonant(String strConsonant) {
// Log.logInfo("came in is_consonant: Checking whether string is a
// consonant");
return strConsonant.equals("k") || strConsonant.equals("kh") || strConsonant.equals("g")
|| strConsonant.equals("gh") || strConsonant.equals("ng") || strConsonant.equals("ch") || strConsonant.equals("chh") || strConsonant.equals("j")
|| strConsonant.equals("jh") || strConsonant.equals("ny") || strConsonant.equals("t") || strConsonant.equals("th") ||
strConsonant.equals("d") || strConsonant.equals("dh") || strConsonant.equals("n") || strConsonant.equals("nn") || strConsonant.equals("p") ||
strConsonant.equals("ph") || strConsonant.equals("b") || strConsonant.equals("bh") || strConsonant.equals("m") || strConsonant.equals("y") ||
strConsonant.equals("r") || strConsonant.equals("rr") || strConsonant.equals("l") || strConsonant.equals("ll") || strConsonant.equals("lll") ||
strConsonant.equals("v") || strConsonant.equals("sh") || strConsonant.equals("ss") || strConsonant.equals("s") || strConsonant.equals("h") ||
strConsonant.equals("3") || strConsonant.equals("z") || strConsonant.equals("v") || strConsonant.equals("Ω") ||
strConsonant.equals("κ") || strConsonant.equals("K") || strConsonant.equals("γ") || strConsonant.equals("ζ") || strConsonant.equals("φ") ||
strConsonant.equals("δ") || strConsonant.equals("Δ") || strConsonant.equals("τ") || strConsonant.equals("θ") || strConsonant.equals("σ");
}
}
这是最简单的方法,您可以通过它进行音译,这是通用方法,可以将其应用于将任何语言翻译成其他语言的音译。您只需提供完美的unicode即可映射到其他语言的unicode。 / p>
结果: