如何为docx文件读取文本,如antiword?

时间:2015-06-16 15:15:30

标签: php text

谁知道,如何在php中读取file.doc中的file.doc一样的file.docx? 我在file.doc中使用了antiword,在DB

中设置了文本
    $em = $this->getDoctrine()->getManager();
    $request = $this->get('request');
    $developer = $em->getRepository('ProfileBundle:Developer')->findOneById($id);

    if (! $developer) {
        throw $this->createNotFoundException('Unable to find a profile.');
    }

    $cv = $developer->getCvDirUri();

    if($cv && file_exists($cv)) {
        unlink($cv);
    }

    $form = $this->createForm(new DeveloperDirCvType(), array());

    if ($request->isMethod('POST')) {

        $form->bind($request);
        if ($form->isValid()) {

            $data = $form->getData();

            $uploader = $this->get('artel.profile.file_uploader');
            $path = $uploader->uploadFile($data['photo']);
            $developer->setCvDirUri($path['url']);
            $content = shell_exec('/usr/bin/antiword '.'chmod o+r /var/www/aog-profile/web/'.$path['url']);
            if ($data['photo']->getClientMimeType() == 'application/vnd.openxmlformats-officedocument.wordprocessingml.document') {
                $content_txt = exec('/usr/bin/abiword --to=html '.'/var/www/aog-profile/web/'.$path['url']);

            }
            elseif ($data['photo']->getClientMimeType() == 'application/pdf') {
                $parser = new \Smalot\PdfParser\Parser();
                $pdf    = $parser->parseFile('/var/www/aog-profile/web/'.$path['url']);

                $content = $pdf->getText();

            } 
            else{
                $content = shell_exec('/usr/bin/antiword -m UTF-8.txt '.'chmod o+r /var/www/aog-profile/web/'.$path['url']);
            }


            $url = sprintf(
                '%s%s',
                $this->container->getParameter('acme_storage.amazon_s3.base_url'),
                $this->getPhotoUploader()->uploadFromUrl($path['url'])
            );

            $developer->setTextCv($content);
            $developer->setCvUri($url);


            $em->flush();

如果file.doc我使用了antiword和setTextCv($ content),我在DB中有文本,我在亚马逊上传,但是

如果这个文件docx我在/upload/Cv/file.docx中上传docx文件并创建file.html。然后我需要setTextCv(文件html中的'文本')或者你知道另一种方法吗?而且我不知道它是如何做的。有什么想法吗?

0 个答案:

没有答案