Question

我是一个新手开发人员学习python和。我试图递归地解析包含多个pdf的文件夹和子文件夹，并根据子文件夹名称将它们合并为一个pdf。我有以下文件夹和子文件夹结构

合并前的

文件夹

dummy
           ball
               ball_baseball.pdf
               ball_basketball.pdf
               ball_volleyball.pdf
          ice
               ice_skating.pdf
               ice_curling.pdf
               ice_hockey.pdf

想要看到的理想结果是

       dummy
           ball
               ball.pdf(containing 3 sheets)
           ice
               ice.pdf (containing 3 sheets)

有问题字符串之前使用pandas回答csv文件。但我使用PyPDf来合并pdf'f 这是我到目前为止尝试过的代码。它似乎工作但是imay搞砸了for循环所以递归地附加并合并pdf在子文件夹中

import sys, os,PyPDf2
from PyPDF2 import PdfFileMerger, PdfFileReader
dirs=r"path to the folder directory"
for root,dirs,files in os.walk(dirs):
    merger = PdfFileMerger()
    for filename in files:
        if filename.endswith(".pdf"):
            filepath = os.path.join(root, filename)
            merger.append(PdfFileReader(open(filepath, 'rb')))
            merger.write(str(filename))`

任何建议将不胜感激提前致谢

Answer 1

如果你想要的是将合并的文件写入包含你的python脚本而不是子文件夹的文件夹，你需要做一些调整：

import sys, os,PyPDf2

from PyPDF2 import PdfFileMerger, PdfFileReader
hdir=r #path to the folder directory; would suggest using os.getcwd()
for root,dirs,files in os.walk(hdir):
#changed so that directories thrown by os.walk are not the same as start
    merger = PdfFileMerger()
    for dir in dirs:
        for filename in files:
            if filename.endswith(".pdf"):
                filepath = os.path.join(root, filename)
                merger.append(PdfFileReader(open(filepath, 'rb')))
                #merger.write(str(filename))
        merger.write(os.path.join(hdir,dir+'.pdf'))
        #writes to the main directory, names the merged file after the subdirectory

Answer 2

我知道这是一个很老的问题，但是我自己也遇到了同样的问题。我尝试了C. Taylor的解决方案，但最终出现了一些错误。无论如何，遵循以下代码对我有用。

import sys, os,PyPDf2
from PyPDF2 import PdfFileMerger, PdfFileReader
print("testing ")

hdir=os.getcwd()
for root,dirs,files in os.walk(hdir):
    merger = PdfFileMerger()    
    for filename in files:
        if filename.endswith(".pdf"):
            print(filename)
            filepath = os.path.join(root, filename)
            merger.append(PdfFileReader(open(filepath, 'rb')))
    merger.write(os.path.join(hdir,os.path.basename(os.path.normpath(root))+'.pdf'))

合并的PDF具有其文件夹的名称，并将其写入主目录。

Answer 3

我已经想过如何在循环中运行它们

package x.selenide;
//RunTest.java
import org.junit.runner.JUnitCore;

public class RunTest {
    public static void main(String[] args) {
        System.out.println("In RunTest.main");
        JUnitCore junit = new JUnitCore();
        junit.run(RegisterVisitorTest.class);
    }
}

//RegisterVisitorTest.java
public class RegisterVisitorTest extends ClickTest {

    private static String lastName;
    private static LocalDate firstDay;
    private static LocalDate lastDay;

    private static final DateTimeFormatter dateFormat = DateTimeFormatter.ofPattern("d-M-yyyy");

    public RegisterVisitorTest() {
        System.out.println("RegisterVisitorTest");
    }

    @BeforeClass
    public static void setUp() {
        // setup properties with System.getProperties();
    }

    @Test
    public void openRegistrationPage(){
        Selenide.$(Selectors.byText("Bezoekers aanmelden")).click();
        String parentWindowHandle = WebDriverRunner.getWebDriver().getWindowHandle();

        // switch tab/window as it opens a new window
        Set<String> handles = WebDriverRunner.getWebDriver().getWindowHandles();
        for (String handle: handles){
            if(!handle.equals(parentWindowHandle)){
                Selenide.switchTo().window(handle);
            }
        }

        // method call to fill the actual registration form
    }
}
// ClickTest.java
public abstract class ClickTest {
    @BeforeClass
    public static void openOrderSite() {
        Configuration.timeout = 10000;
        Configuration.baseUrl = "https://intranet.net";
        Configuration.startMaximized = false;
        Selenide.open("/subdomain");
        waitUntilPageIsLoaded();
    }

    private static void waitUntilPageIsLoaded() {
        waitUntilPageIsLoaded("Bezoekers aanmelden");
    }

    static void waitUntilPageIsLoaded(String expected){
        logger.info(String.format("Waiting for string '%s' to appear...", expected));
        Selenide.$(Selectors.byText(expected)).waitUntil(Condition.appears, 20000);
        logger.info("Page loaded");
    }

    @AfterClass
    public static void logout() {
        WebDriverRunner.closeWebDriver();
    }
}

在循环中引入merger = PdfMerger（）就可以了！

使用python中的pyPDF2模块以递归方式合并子文件夹中的pdf

3 个答案: