Question

我尝试阅读pdf文件。我想使用这个功能

databuffer = [file readDataOfLength : 7] but i want read all the byte in the line .

这意味着我使用seekToFileOffset来查找字节，但我想读取该行。

NSMutableArray      *nameArray = [[NSMutableArray alloc] initWithObjects:nil];

NSMutableArray      *nameArrayDict = [[NSMutableArray alloc] initWithObjects:nil];
NSString            *path          = [[NSBundle  mainBundle] pathForResource:@"testpdf" ofType:@"pdf"];
NSString            *contents      = [NSString   stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:nil];


    int var=[[nameArray objectAtIndex:[nameArray count]-2] intValue];

     NSFileHandle *file;
     NSData *databuffer;
     file = [NSFileHandle fileHandleForReadingAtPath: appFile];

     int i=0;
     while (file!=nil ) {
        [file seekToFileOffset: var+i];

        databuffer = [file readDataOfLength : 7];
        NSString* aStr;
        aStr = [[NSString alloc] initWithData: databuffer encoding:NSASCIIStringEncoding];

        NSLog(@"%@",aStr);

        i=i+[databuffer length];
    }

现在我尝试你的解决方案，但我什么都不能显示!!!

CGPDFPageRef page = CGPDFDocumentGetPage (myDocument, 1);// 2

CGPDFDictionaryRef d;

d = CGPDFPageGetDictionary(page);


CGPDFScannerRef myScanner; 

CGPDFOperatorTableRef myTable;
myTable = CGPDFOperatorTableCreate();



CGPDFContentStreamRef myContentStream = CGPDFContentStreamCreateWithPage (page);// 3




myScanner = CGPDFScannerCreate (myContentStream, myTable, NULL);// 4





CGPDFScannerScan (myScanner);// 5


CGPDFOperatorTableSetCallback(myTable,  "BT", &op_BT);//Begin text object
CGPDFOperatorTableSetCallback(myTable,  "ET", &op_ET);//End text object
CGPDFOperatorTableSetCallback (myTable, "MP", &op_MP);
CGPDFOperatorTableSetCallback (myTable, "DP", &op_DP);
CGPDFOperatorTableSetCallback (myTable, "BMC", &op_BMC);
CGPDFOperatorTableSetCallback (myTable, "BDC", &op_BDC);
CGPDFOperatorTableSetCallback (myTable, "EMC", &op_EMC);

Answer 1

我建议不要以您喜欢的方式解析您的PDF格式。

尝试使用CGPDFScanner（docs here）

正如JeremyP所说：很多zlib压缩的东西都在PDF文件中。搜索行尾。使用CGPDFScanner提取字体映射，图像等。

但这并不容易。我可以告诉你。：）

在pdf文件中定位字节并读取行

1 个答案: