Question

我想在转换PDF后找到一些单词。

1）我在路径C：\ TRM \ PDF 1.pdf和2.pdf处有2个PDF

1.pdf带有单词“ ICG00058”
2.pdf的单词为“ ICG00065”

2）已经将$ ddede_1}的Pdf转换为Txt。

3）假设我不知道1.pdf中的单词，我想检查一下 1.pdf具有ICG00058或ICG00065。

很抱歉，如果问题不清楚。请为我调查此问题，因为这对我的工作至关重要。

*** Settings ***
Library         Selenium2Library
Library         String
Library         Pdf2TextLibrary


*** Test Cases ***
Read PDF

    ${detail_1}     Convert Pdf To Txt              C:\\TRM\\PDF\\1.pdf
    LOG     ${detail_1} 
    ${ID_1}     Get Regexp Matches        ${detail_1}          ICG00058
    ${ID_2}     Get Regexp Matches        ${detail_1}          ICG00065
 Run Keyword And Ignore Error          $ID_1[0] in $detail_1      LOG   ${ID_1}
 Run Keyword If                        $ID_2[0] in $detail_1      LOG   ${ID_2}

错误：评估表达式“ RF_VAR_detail_2中的RF_VAR_ID_2 [0]”失败：IndexError：列表索引超出范围

Answer 1

如果要在机器人脚本中运行python，则需要调用Evaluate关键字，这就是为什么您看到提到的错误的原因。

尽管您可以使用Get Index From List和List Should Contain Value关键字：

${matched_id_1}=    Get Index From List     ${ID_1}     0
Run Keyword And Ignore Error    List Should Contain Value    ${detail_1}    ${matched_id_1}

${matched_id_2}=    Get Index From List     ${ID_2}     0
Run Keyword And Ignore Error    List Should Contain Value    ${detail_1}    ${matched_id_2}

如果您想使用List Should Contain Value关键字作为断言，只需删除答案中的Run Keyword And Ignore Error关键字

Answer 2

我使用pdfgrep就像grep一样，您可以在pdf中搜索regex表达式，而无需任何中间步骤。

我用它来查找pdf中的ISBN号，并自动重命名文件名以包含找到的ISBN或将文件名和ISBN写入MySQL数据库。

如果您不知道如何编写regex，可以使用一些在线工具在线测试regex，直到找到合适的工具为止。

在PDF中查找单词

2 个答案: