有什么支持merge pdf文件的cmd工具

水木社区手机版

主题:有什么支持merge pdf文件的cmd工具
楼主|lobachevsky|2023-12-28 16:12:42|只看此ID
如题

有一些pdf需要merge成一个pdf,并且需要指定merge的顺序

有什么命令行工具能搞定么
--
FROM 1.202.141.*
1楼|pixYY|2023-12-30 10:09:21|只看此ID
pdftk 就是干这个的

【在 lobachevsky 的大作中提到: 】
: 如题
: 有一些pdf需要merge成一个pdf,并且需要指定merge的顺序
: 有什么命令行工具能搞定么
: ...................
--
FROM 111.36.200.*
2楼|xuxinl|2023-12-30 22:19:51|只看此ID
pdftk A=test1.pdf B=test2.pdf C=test3.pdf cat A B C output out.pdf
--
FROM 117.133.66.*
3楼|users|2024-01-01 09:28:15|只看此ID
qpdf
--
FROM 101.21.232.*
4楼|juking|2024-01-12 10:08:14|只看此ID
能用 python 的话可以直接试试我的脚本，（用 chatgpt 生成的，测试加优化差不多一刻钟，以后这类小工具的市场会进一步被挤压啊）

"""
# Install the required packages
pip install PyPDF2
# Usage:
python pdf.py --input <input_file_or_folder> --output <output_file_or_folder> --exclude <exclude_file_or_folder>
1. When `input` is a folder, merge all PDF files in the folder into one PDF file specified as `output`.
2. When `input` is a file, split the PDF file into multiple PDF files in the folder specified as `output`.
3. When `exclude` is specified, the files in the list will be excluded from the merging process.
"""

import os
import PyPDF2
import argparse
from shutil import rmtree

def merge_pdf_files(folder, output_file, exclude=None):
    """Merge all PDF files in the folder into one PDF file."""
    pdf_files = [file for file in os.listdir(folder) if file.endswith('.pdf')]
    if exclude:
        pdf_files = [file for file in pdf_files if file not in exclude]
    pdf_files.sort()

    pdf_merger = PyPDF2.PdfMerger()

    for pdf_file in pdf_files:
        pdf_path = os.path.join(folder, pdf_file)
        pdf_merger.append(pdf_path)

    with open(output_file, 'wb') as output_pdf:
        pdf_merger.write(output_pdf)

    pdf_merger.close()
    print(f'Merged {len(pdf_files)} PDF files into {output_file}')

def split_pdf_file(input_file, output_folder):
    """Split the PDF file into multiple PDF files."""
    if not os.path.exists(output_folder):
        os.mkdir(output_folder)

    pdf_file = open(input_file, 'rb')
    pdf_reader = PyPDF2.PdfFileReader(pdf_file)

    for page_num in range(pdf_reader.numPages):
        pdf_writer = PyPDF2.PdfFileWriter()
        pdf_writer.addPage(pdf_reader.getPage(page_num))

        output_filename = os.path.join(output_folder, f'page_{page_num + 1}.pdf')
        with open(output_filename, 'wb') as output_pdf:
            pdf_writer.write(output_pdf)

    pdf_file.close()
    print(f'Split {input_file} into {pdf_reader.numPages} pages in {output_folder}')

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Merge or split PDF files.")
    parser.add_argument("--input", '-i', help="Input file or folder")
    parser.add_argument("--output", '-o', help="Output file or folder", type=str, default=None)
    parser.add_argument("--exclude", '-e', help="Exclude files", nargs='+', default=None)
    args = parser.parse_args()

    if os.path.isdir(args.input):
        if not args.output:
            args.output = 'merged.pdf'
        if args.output in os.listdir(args.input):
            os.remove(os.path.join(args.input, args.output))

        merge_pdf_files(args.input, os.path.join(args.input, args.output), args.exclude)
    elif os.path.isfile(args.input):
        if not args.output:
            args.output = 'split'
        if os.path.exists(args.output):
            rmtree(args.output)
        split_pdf_file(args.input, args.output)
--
FROM 167.220.255.*