我是一个新手开发人员学习python和。我试图递归地解析包含多个pdf的文件夹和子文件夹,并根据子文件夹名称将它们合并为一个pdf。 我有以下文件夹和子文件夹结构
合并前的文件夹
dummy
ball
ball_baseball.pdf
ball_basketball.pdf
ball_volleyball.pdf
ice
ice_skating.pdf
ice_curling.pdf
ice_hockey.pdf
想要看到的理想结果是
dummy
ball
ball.pdf(containing 3 sheets)
ice
ice.pdf (containing 3 sheets)
有问题字符串之前使用pandas回答csv文件。但我使用PyPDf来合并pdf'f 这是我到目前为止尝试过的代码。 它似乎工作但是imay搞砸了for循环所以递归地附加并合并pdf在子文件夹中
import sys, os,PyPDf2
from PyPDF2 import PdfFileMerger, PdfFileReader
dirs=r"path to the folder directory"
for root,dirs,files in os.walk(dirs):
merger = PdfFileMerger()
for filename in files:
if filename.endswith(".pdf"):
filepath = os.path.join(root, filename)
merger.append(PdfFileReader(open(filepath, 'rb')))
merger.write(str(filename))`
任何建议将不胜感激 提前致谢
答案 0 :(得分:0)
如果你想要的是将合并的文件写入包含你的python脚本而不是子文件夹的文件夹,你需要做一些调整:
import sys, os,PyPDf2
from PyPDF2 import PdfFileMerger, PdfFileReader
hdir=r #path to the folder directory; would suggest using os.getcwd()
for root,dirs,files in os.walk(hdir):
#changed so that directories thrown by os.walk are not the same as start
merger = PdfFileMerger()
for dir in dirs:
for filename in files:
if filename.endswith(".pdf"):
filepath = os.path.join(root, filename)
merger.append(PdfFileReader(open(filepath, 'rb')))
#merger.write(str(filename))
merger.write(os.path.join(hdir,dir+'.pdf'))
#writes to the main directory, names the merged file after the subdirectory
答案 1 :(得分:0)
我知道这是一个很老的问题,但是我自己也遇到了同样的问题。我尝试了C. Taylor的解决方案,但最终出现了一些错误。无论如何,遵循以下代码对我有用。
import sys, os,PyPDf2
from PyPDF2 import PdfFileMerger, PdfFileReader
print("testing ")
hdir=os.getcwd()
for root,dirs,files in os.walk(hdir):
merger = PdfFileMerger()
for filename in files:
if filename.endswith(".pdf"):
print(filename)
filepath = os.path.join(root, filename)
merger.append(PdfFileReader(open(filepath, 'rb')))
merger.write(os.path.join(hdir,os.path.basename(os.path.normpath(root))+'.pdf'))
合并的PDF具有其文件夹的名称,并将其写入主目录。
答案 2 :(得分:-1)
我已经想过如何在循环中运行它们
package x.selenide;
//RunTest.java
import org.junit.runner.JUnitCore;
public class RunTest {
public static void main(String[] args) {
System.out.println("In RunTest.main");
JUnitCore junit = new JUnitCore();
junit.run(RegisterVisitorTest.class);
}
}
//RegisterVisitorTest.java
public class RegisterVisitorTest extends ClickTest {
private static String lastName;
private static LocalDate firstDay;
private static LocalDate lastDay;
private static final DateTimeFormatter dateFormat = DateTimeFormatter.ofPattern("d-M-yyyy");
public RegisterVisitorTest() {
System.out.println("RegisterVisitorTest");
}
@BeforeClass
public static void setUp() {
// setup properties with System.getProperties();
}
@Test
public void openRegistrationPage(){
Selenide.$(Selectors.byText("Bezoekers aanmelden")).click();
String parentWindowHandle = WebDriverRunner.getWebDriver().getWindowHandle();
// switch tab/window as it opens a new window
Set<String> handles = WebDriverRunner.getWebDriver().getWindowHandles();
for (String handle: handles){
if(!handle.equals(parentWindowHandle)){
Selenide.switchTo().window(handle);
}
}
// method call to fill the actual registration form
}
}
// ClickTest.java
public abstract class ClickTest {
@BeforeClass
public static void openOrderSite() {
Configuration.timeout = 10000;
Configuration.baseUrl = "https://intranet.net";
Configuration.startMaximized = false;
Selenide.open("/subdomain");
waitUntilPageIsLoaded();
}
private static void waitUntilPageIsLoaded() {
waitUntilPageIsLoaded("Bezoekers aanmelden");
}
static void waitUntilPageIsLoaded(String expected){
logger.info(String.format("Waiting for string '%s' to appear...", expected));
Selenide.$(Selectors.byText(expected)).waitUntil(Condition.appears, 20000);
logger.info("Page loaded");
}
@AfterClass
public static void logout() {
WebDriverRunner.closeWebDriver();
}
}
在循环中引入merger = PdfMerger()就可以了!