概率。关于希伯来语编码

时间:2013-04-29 11:02:09

标签: c# character-encoding hebrew unicode-string

我有一个希伯来文本就像"×گض¸×¨ض´×™×،ض°×کוض¹×ں"一样,我希望将它转换成可读的unicode希伯来字符。

我试过这段代码:

const string Str = "×گض¸×¨ض´×™×،ض°×کוض¹×ں";

Encoding enc1 = Encoding.Default;
Encoding enc2 = Encoding.Unicode;

byte[] bytes = enc1.GetBytes(Str);

string hebrewString = enc2.GetString(bytes);

label1.Text = hebrewString;

但它没有成功。请帮助。

更新 该文本来自html源代码

Version:1.0
StartHTML:000000210
EndHTML:000006218
StartFragment:000001595
EndFragment:000006126
StartSelection:000001595
EndSelection:000006126
SourceURL:file:///C:/ProgramData/Babylon/LocalUI/wnd.html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-    html401-19991224/loose.dtd">

<HTML 
xmlns="http://www.w3.org/1999/xhtml"><HEAD><TITLE>CLient build #1.2</TITLE><LINK 
rel=stylesheet type=text/css href="img/frame.css?ver=41"><LINK rel=stylesheet 
type=text/css href="img/baby.css?ver=41"><LINK rel=stylesheet type=text/css 
href="img/word.css?ver=41"><LINK rel=stylesheet type=text/css 
href="img/text.css?ver=41">
<SCRIPT type=text/javascript src="js/moudles.js?ver=100"></SCRIPT>

<SCRIPT type=text/javascript src="js/extrnl.js?ver=100"></SCRIPT>

<SCRIPT type=text/javascript src="js/frame.js?ver=100"></SCRIPT>

<SCRIPT type=text/javascript src="js/word.js?ver=100"></SCRIPT>

<SCRIPT type=text/javascript src="js/fTxt.js?ver=100"></SCRIPT>

<SCRIPT type=text/javascript src="js/baby.js?ver=100"></SCRIPT>

<SCRIPT type=text/javascript src="js/plcy.js?ver=100"></SCRIPT>
</HEAD>

<BODY style="FONT-FAMILY: Verdana" onload=bodyLoad() 
class="scrollBar ie7 fontSize5" bgColor=#000100 name="Rslt">

<DIV class=m2>

<DIV class=mrg>

<DIV style="BOTTOM: -67px" id=baseBody class=client>

<DIV id=wordContainer>

<DIV style="OVERFLOW-Y: scroll; DISPLAY: block; FONT-FAMILY: Tahoma" 
id=resultContainer class=wordBody>

<DIV id=rsltCntnr>

<DIV style="CURSOR: auto" id=BABID_Results><!--StartFragment--><DIV id=BABIDPtr_!!Z8UVKYSMBJ class=result entryType="3" entryPrio="1100099960">
<TABLE style="TABLE-LAYOUT: fixed" class=res-head cellSpacing=0 cellPadding=3 
width="100%">
<TBODY>
<TR>
<TD vAlign=top width=20><IMG id=BABID_CPIconImg class=BAB_ImgInTitle 
src="C:\Users\Mahmoud\AppData\Roaming\Babylon\Content\icons/Z8UVKYSMBJ_glossary_icon.ico"> 
</TD>
<TD id=BABID_CPTitle vAlign=top>
<DIV style="DISPLAY: inline" id=BABID_CPName class=BAB_NormalTitle>×‍ض´×œض¼×•ض¹×ں 
×گض¶×‘ض¶×ںض¾×©×پוض¹×©×پض¸×ں ×”ض·×‍ض¼ض¸×œضµ×گ</DIV><SPAN style="PADDING-LEFT: 2px"     id=BABID_CPBandBtns 
valign="top"><IMG class=BAB_ImageBtn title="Dictionary menu" tabIndex=0 
src="c:\programdata\babylon\localui\img\res\btn_titlemenu.png" 
behavetype="3ImageState" bab_name="BTN_TitleMenu"> 
</SPAN></TD></TR></TBODY></TABLE>
<DIV id=BABID_CPResult class=BAB_CPBodyStyleLocal>
<DIV xmlns:babex="urn:schemas-babylon-com:babex" 
xmlns:bab="urn:schemas-babylon-com:bab" 
xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<DIV dir=rtl class=term align=right>
<DIV style="FLOAT: right" dir=ltr class=btnArr><IMG class=BAB_ImageBtn 
title="Previous term" tabIndex=0 
src="c:\programdata\babylon\localui\img\res\btn_browseprevious.png" 
bab_name="BTN_BrowsePrevious" behaveType="3ImageState" baburi=""><IMG 
class=BAB_ImageBtn title="Next term" tabIndex=0 
src="c:\programdata\babylon\localui\img\res\btn_browsenext.png" 
bab_name="BTN_BrowseNext" behaveType="3ImageState" baburi=""></DIV>×گض¸×¨ض´×™×،ض°×کוض¹×ں
<DIV class=rsltSpkrCntnr><IMG class=BAB_ImageBtn 
title="To listen to a text, select it, and click the speaker button" tabIndex=0 
src="c:\programdata\babylon\localui\img\res\btn_sayit_rtl.png" valign="bottom" 
bab_name="BTN_SayIt_rtl" behaveType="3ImageState" baburi="" 
term="×گض¸×¨ض´×™×،ض°×کוض¹×ں"></DIV></DIV>
<DIV class=definition align=right><SPAN dir=rtl>
<STYLE>a{cursor:pointer;text-decoration:none;color:blue</STYLE>

<DIV 
style="LINE-HEIGHT: 160%; FONT-FAMILY: David,Times New Roman; FONT-SIZE: 130%" 
dir=rtl><FONT style="COLOR: black; FONT-WEIGHT: normal"><SUP>×ھ</SUP></FONT> 
<FONT color=blue>(×–')</FONT> [יווני×ھ: ariston] ×،ض°×¢×•ض¼×“ض¸×”, ×گض²×¨×•ض¼×—    ض¸×”: "×گض¸×¨ض´×™×،ض°×کוض¹×ں 
×¢ض¸×ھض´×™×“ ×”ض·×§ض¼ض¸×“וض¹×©×پ-בض¼ض¸×¨×•ض¼×ڑض°-הוض¼×گ לض·×¢ض²×©×‚וض¹×ھ לض·×¢ض²    בض¸×“ض¸×™×• ×”ض·×¦ض¼ض·×“ض¼ض´×™×§ض´×™×‌ לض¶×¢ض¸×ھض´×™×“ 
לض¸×‘וض¹×گ" (ויקר×گ רבה ×™×’). "×گض²× ض´×™ עוض¹×¨ضµ×ڑض° ×”ض¸×گض¸×¨ض´×™×،ض°×کוض¹×ں     לض·×—ض²×‘ض´×™×‘ض·×™, ×›ض¼ض·× ض°×¤ضµ×™ ×”ض¸×¨ض¹×ں" 
(ש×‍עוני, שירי×‌ ×’ פה).
<P>[×گض²×¨ض´×™×،ض°×کض´×™×ں] </P></DIV></SPAN></DIV><BR 
style="CLEAR: both; FONT-SIZE: 1px"></DIV>
<DIV class=BAB_CPCopyrightStyle xmlns:babex="urn:schemas-babylon-com:babex" 
xmlns:bab="urn:schemas-babylon-com:bab" 
xmlns:msxsl="urn:schemas-microsoft-com:xslt"><BR><BR><BR>
<DIV dir=rtl>
<P><BR><BR>آ© כל הזכויו×ھ ש×‍ורו×ھ ליורשי ×”×‍חבר<BR>Copyright 2003, The     author's 
heirs آ©</P><BR><BR><BR><BR>
<LI><B>להקד×‍×”, לה×،ברי×‌, לרשי×‍×ھ ×”×‍קורו×ھ ועוד - ר×گו <A 
style="TEXT-DECORATION: none" 
href="bword://×‍ض´×œض¼×•ض¹×ں ×گض¶×‘ض¶×ںض¾×©×پוض¹×©×پض¸×ں ×”ض·×‍ض¼ض¸×œضµ×گ/">×‍ض´×œض¼×•ض¹×ں     ×گض¶×‘ض¶×ںض¾×©×پוض¹×©×پض¸×ں ×”ض·×‍ض¼ض¸×œضµ×گ 
- ×¢ض·×‍ض¼×•ض¼×“ضµ×™ ×”ض·×¤ض¼ض°×ھض´×™×—ض¸×”</A>.</B></LI></DIV></DIV>
<DIV style="DISPLAY: none" dir=rtl id=BABID_BottomLinks 
xmlns:babex="urn:schemas-babylon-com:babex" 
xmlns:bab="urn:schemas-babylon-com:bab" 
xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<DIV style="FLOAT: left" id=BABID_BottomActions></DIV> <BR 
style="CLEAR: both; FONT-SIZE: 1px"></DIV>
<DIV class=prcTrial xmlns:babex="urn:schemas-babylon-com:babex" 
xmlns:bab="urn:schemas-babylon-com:bab" 
xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<DIV class=left-corner><IMG class=BAB_ImageStat 
src="c:\programdata\babylon\localui\img\res\trialcornerleft.png" width=4 
height=55 bab_name="TrialCornerLeft"></DIV>
<DIV style="BACKGROUND: none transparent scroll repeat 0% 0%" 
class=right-corner><IMG class=BAB_ImageStat 
src="c:\programdata\babylon\localui\img\res\trialcornerright.png" width=4 
height=55 bab_name="TrialCornerRight"></DIV>
<DIV class=prcTrial-body>
<DIV class=days-left>Dictionary trial version (4 days)</DIV><IMG 
class=BAB_ImageStat src="c:\programdata\babylon\localui\img\res\prctrial.png" 
bab_name="PRCTrial"><SPAN class=buy-link><A id=CP_LINK 
href="buyprc://!!Z8UVKYSMBJ,745,0/">Buy This 
Dictionary</A></SPAN></DIV></DIV></DIV><BR 
style="CLEAR: both; FONT-SIZE: 1px"></DIV><!--EndFragment--></DIV>
</DIV>
</DIV>
</DIV>
</DIV>
</DIV>
</DIV>
</BODY>
</HTML>

这个html工作得很好,但是我无法将希伯来文本变成字符串 感谢

2 个答案:

答案 0 :(得分:3)

  

看来你Str不是有效的希伯来字符串。

检查一些可能的编码,它似乎是Hebrew (Windows)

 const string Str = "×گض¸×¨ض´×™×،ض°×کוض¹×ں";

 Encoding origionEncoding = Encoding.GetEncoding(1256); //assume the string was encoded as arabic

 byte[] bytes = origionEncoding.GetBytes(Str);

 Encoding desEncoding = Encoding.GetEncoding(1255);       //Hebrew (Windows) 

 string hebrewString = desEncoding.GetString(bytes);

编辑:希伯来字符串似乎已经编码为阿拉伯语编码,所以要反过来(如果可能的话)我们应该尝试encodings的可能的起源/目的地对。

答案 1 :(得分:1)

我解决了。文字是utf-8

       const string Str = "×گض¸×¨ض´×™×،ض°×کוض¹×ں";
        Encoding defaultEncoding = Encoding.Default;
        byte[] bytes = defaultEncoding.GetBytes(Str);
        Encoding encoding2 = Encoding.UTF8;
        string hebrewString2 = encoding2.GetString(bytes);
        label1.Text = hebrewString2;

感谢每一个人