我有类似
的东西a = "बिक्रम मेरो नाम हो"
我希望实现像Java一样的东西
a[0] = बि
a[1] = क्र
a[3] = म
答案 0 :(得分:1)
我的代码根本没有优化,对不起,但它有效!
只需更改您要输入devnagri句子的文件的路径,它就可以正常工作。
public static void main(String[] args) throws IOException
{
BufferedReader br = new BufferedReader(new FileReader("/home/ubuntu/Documents/trainforjava.txt")); //PLEASE ENTER PATH HERE
String[] devFull = new String[]{
"अ","आ", "इ", "ई", "उ", "ऊ", "ऋ"
, "ऌ" ,"ऍ", "ए", "ऐ", "ऑ", "ओ", "औ",
"क", "ख", "ग", "घ" ,"ङ",
"च" ,"छ" ,"ज"," झ"," ञ",
"ट","ठ", "ड"," ढ"," ण",
"त", "थ", "द", "ध", "न",
"प", "फ", "ब"," भ","म",
"य", "र", "ल", "ळ",
"व", "श" ,"ष","स" ,"ह"
};
String[] uniDev = new String[]
{
"905","906","907","908","909","90a","90b",
"90c","90d","90f","910","911","913","914",
"915","916","917","918","919",
"91a","91b","91c","91d","91e",
"91f","920","921","922","923",
"924","925","926","927","928",
"92a","92b","92c","92d","92e",
"92f","930","932","933",
"935","936","937","938","939"
};
String[] devHalf = new String[]
{
"$़","ऽ","$ा","$ि" ,
"$ी", "$ ु","$ू","$ृ","$ॄ","$ॅ",
"$े","$ै","$ॉ",
"$ो","$ौ"
};
String[] gujHalf = new String[]
{
"$઼","ઽ","$ા","$િ" ,
"$ી","$ુ","$ૂ","$ૃ","$ૄ","$ૅ",
"$ે","$ૈ","$ૉ",
"$ો","$ૌ"
};
try
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while( (line = br.readLine() ) != null)
{
line=line.replaceAll(" ", ""); //remove white spaces if any
System.out.println();
//System.out.println(line);
int strLength = line.length();
// String a = "बिक्रम मेरो नाम हो";
int strLen = line.length();
char array[] = new char[strLen];
String strArray1[] = new String[strLen];
int mark[] = new int[strLen+1];
String unis[]=new String[strLen];
int cnt=0;
String newCharD[]=new String [strLen];
String newCharG[]=new String [strLen];
String tempD=null;
String tempG=null;
String arr = null;
String next =null;
String temp=null;
String uniNext=null;
int hold=0;
int j=0;
for (int i=0 ; i< strLen ; i++)
{
j=i+1;
array[i] = line.charAt(i);
strArray1[i] = Character.toString(line.charAt(i));
if(i<(strLen-1))
{
char nbit = line.charAt(j);
next=Character.toString(line.charAt(j));
uniNext=Integer.toHexString(nbit);
//System.out.print("\nUninext:\t"+uniNext);
}
unis[i]=Integer.toHexString(array[i]);
mark[strLen]=1;
if((Arrays.asList(devFull).contains(Character.toString(array[i]))) && (!uniNext.equalsIgnoreCase("94d")) )
{
mark[i]=1;
}
else
{
mark[i]=0;
}
//
//System.out.println();
//System.out.println ("Index = " + i + "* Char = " +array[i] + "** String =" +strArray1[i]+ "Unicode="+unis[i]+"Mark="+mark[i]);
//System.out.print(unis[i].toString());
}
int start=0;
start=0;
for(int l1=0;l1<=strLen;l1++)
{
//start=0;
if(l1==0)
{
temp=Character.toString(array[l1]);
}
else
{
if(mark[l1]==0)
{
temp=temp+Character.toString(array[l1]);
}
else
{
System.out.print(" "+temp);
newCharD[start]=temp;
start++;
temp=null;
if(l1!=strLen)
{
temp=Character.toString(array[l1]);
}
}
}
}
/* for(int s=0;s<start;s++)
{
System.out.print(" "+newCharD[s]);
}*/
for(int s=0;s<start;s++)
{
}
}
}
finally {
br.close();
}
//PrintStream out = new PrintStream(new //FileOutputStream("/home/ubuntu/Documents/trainforjavaoutput.txt"));
//System.setOut(out);
}
答案 1 :(得分:0)
Java内部以UTF-16(2个字节)存储任何语言的每个字符,因此您可以安全地单独访问这些字符。
答案 2 :(得分:0)
试试这个:
String a = "बिक्रम मेरो नाम हो";
int strLen = a.length();
char array[] = new char[strLen];
String strArray1[] = new String[strLen];
for (int i=0 ; i< strLen ; i++)
{
array[i] = a.charAt(i);
strArray1[i] = Character.toString(a.charAt(i));
System.out.println ("Index = " + i + "* Char = " +array[i] + "** String =" +strArray1[i] );
}
<强>输出:强>
Index = 0* Char = ब** String =ब
Index = 1* Char = ि** String =ि
Index = 2* Char = क** String =क
Index = 3* Char = ्** String =्
Index = 4* Char = र** String =र
Index = 5* Char = म** String =म
Index = 6* Char = ** String =
Index = 7* Char = म** String =म
Index = 8* Char = े** String =े
Index = 9* Char = र** String =र
Index = 10* Char = ो** String =ो
Index = 11* Char = ** String =
Index = 12* Char = न** String =न
Index = 13* Char = ा** String =ा
Index = 14* Char = म** String =म
Index = 15* Char = ** String =
Index = 16* Char = ह** String =ह
Index = 17* Char = ो** String =ो
注意:强>
为了让eclipse允许你用外来字符(印地语字母表)保存你的java程序,请执行以下操作:
转到:
&#34; Windows&gt;偏好&gt;一般&gt;内容类型&gt;文字&gt; {选择文件类型}
{所选文件类型}&gt;默认编码&gt; UTF-8 强>&#34;然后单击更新。
答案 3 :(得分:0)
你尝试过icu4j吗?
BreakIterator character instance可以将字符串拆分为字符
答案 4 :(得分:0)
在印地语中试试这个:-
import java.io.*;
import java.text.BreakIterator;
import java.util.Locale;
public class Test {
public static void main(String[] args) throws IOException
{
String text = "बिक्रम मेरो नाम हो";
Locale hindi = new Locale("hi", "IN");
BreakIterator breaker = BreakIterator.getCharacterInstance(hindi);
breaker.setText(text);
int start = breaker.first();
for (int end = breaker.next();
end != BreakIterator.DONE;
start = end, end = breaker.next()) {
System.out.println(text.substring(start,end));
}
}
}
输出:-
बि
क्र
म
मे
रो
ना
म
हो
<块引用>
BreakIterator Java 文档: https://docs.oracle.com/javase/tutorial/i18n/text/about.html
答案 5 :(得分:-1)
为了按字母而不是字符分割字符串,根据dvasanth的建议,您可以尝试以下方法:
String x = "बिक्रम मेरो नाम हो";
x=x.replaceAll(" ", ""); // Remove all spaces
int strLength = x.length();
String [] letterArray = new String (strLength /2);
String combined = "";
for (int i=0, j=0; i < strLength ; i=i+2,j++)
{
strArray1[i] = Character.toString(x.charAt(i));
if (i+1 < strLength)
{
strArray1[i+1] = Character.toString(x.charAt(i+1));
combined = strArray1[i]+strArray1[i+1]; // This line provides the letters.
// Assumption is that each letter is 2 unicode characters long.
}
else
{
combined = strArray1[i];
}
letterArray [j] = combined;
System.out.println("Split string by letters is : "+combined);
System.out.println("Split string by letters in array is : "+letterArray [j]);
}
输出为:
Split string by letters is : बि
Split string by letters is : क्
Split string by letters is : रम
Split string by letters is : मे
Split string by letters is : रो
Split string by letters is : ना
Split string by letters is : मह
Split string by letters is : ो
注意:
为了让eclipse允许你用外来字符(印地语字母表)保存你的java程序,请执行以下操作:
转到:
&#34; Windows&gt;偏好&gt;一般&gt;内容类型&gt;文字&gt; {选择文件类型}
{所选文件类型}&gt;默认编码&gt; UTF-8 强>&#34;然后单击更新。