带有strlen和strtok的iconv之后的问题

时间:2013-02-27 14:12:48

标签: php strtok iconv ucs2

我编写了一个代码,用于将输入UCS-2LE文件转换为普通的8位ISO-8859-1文本。转换后,我使用strtok函数将整个文本拆分为单词。现在我对获得的每个单词应用strlen,但是我的单词长度很奇怪,我无法理解。

<?php
$fileData = file('input.txt');

foreach( $fileData as $txt ){

    $txt = iconv( 'ISO-8859-1', 'UCS-2LE', $txt );
    $tok = strtok($txt, " \n\t");
    while ($tok !== false) {
        echo 'Word = '.$tok.', Length = '.strlen($tok).'<br />';
        $tok = strtok(" \n\t");
    }
}
?>

输入文件,文件名= input.txt(在UCS-2LE中)是

 Slot#  NumJobs ActiveJobID ActiveBatchJob  ActiveProcStartTime
 0  0   1   input3.dat  7:20 PM
 1  0   2   input3.dat  7:20 PM

输出

Word = ÿþSlot#, Length = 24
Word = NumJobs, Length = 31
Word = ActiveJobID, Length = 47
Word = ActiveBatchJob, Length = 59
Word = ActiveProcStartTime , Length = 83
Word = , Length = 1
Word = 0, Length = 6
Word = 0, Length = 7
Word = 1, Length = 7
Word = input3.dat, Length = 43
Word = 7:20, Length = 19
Word = PM , Length = 15
Word = , Length = 1
Word = 1, Length = 6
Word = 0, Length = 7
Word = 2, Length = 7
Word = input3.dat, Length = 43
Word = 7:20, Length = 19
Word = PM , Length = 15
Word = , Length = 1
Word = , Length = 2

1)如何才能正确显示长度。

2)输出中的第6行是新行字符,未被strtok正确标记。为什么呢?

3)我读了一些关于BOM的内容,我知道文件中的前两个字符用于识别所用字符的格式。有没有办法避免这些字符,比如在第一行输出中,它会显示两个字符。

提前感谢您的帮助。

0 个答案:

没有答案