五分钟实现pdf分页

pdf查询网 204 阅读 0 点赞

背景抱歉也开始用了这么“标题党”的标题。事情起源于前几天需要把个人资料的pdf文档一页一页的拆出来，好传到相关的网站上。直接截图到word再转pdf比较麻烦，所以想用工具直接转换。结果找了几个pdf阅读器，这类操作都需要会员或收费。作为一名程序员，这么简单的操作还要收费显然是一种羞耻(当然我是不会承认主要是因为qiong的)，几分钟就可以代码解决的问题为啥要花钱呢?废话不多说，开搞。工具准备之前的文章Apache POI 详解及 Word 文档读取示例中，我们曾经用apache poi来实现对word文档的操作。对于pdf文件，也同样有apache的pdfbox(官网：https://pdfbox.apache.org/)，和itextpdf(官网：https://itextpdf.com/)包可以使用。 PDFBox： PDFBox 是 java 实现的 PDF 文档协作类库，提供 PDF 文档的创建、处理以及文档内容提取功能，也包含了一些命令行实用工具。PDFBox提供的主要功能有：从 PDF 提取文本合并 PDF 文档PDF 文档加密与解密与 Lucene 搜索引擎的集成填充 PDF/XFDF 表单数据从文本文件创建 PDF 文档从 PDF 页面创建图片打印 PDF 文档 itextpdf： iText是著名的开放源码的站点sourceforge一个项目，是用于生成PDF文档的一个java类库。通过iText不仅可以生成PDF或rtf的文档，而且可以将XML、Html文件转化为PDF文件。 iText的安装非常方便，下载iText.jar文件后，只需要在系统的CLASSPATH中加入iText.jar的路径，在程序中就可以使用iText类库了。依赖引入新建一个java maven工程，引入依赖包(这里使用的是itextpdf的5.5.1 和 pdfbox的2.0.15版本)： 4.0.0 org.example pdf-test 1.0-SNAPSHOT com.itextpdf itextpdf 5.5.1 jar org.apache.pdfbox pdfbox 2.0.15 org.apache.pdfbox fontbox 2.0.15 org.apache.pdfbox jempbox 1.8.16 pdf文件拆分导出实现要实现功能：输入pdf文件路径，指定起止页码，截取这几页内容并写入新的pdf文件。例如起始页码1，截止页码3，则生成一个新文件，存储原pdf文档的1-3页。这里使用的是itextpdf，代码如下： /** * 导出pdf文档中的部分页到新的pdf文件 * @param FilePath 文件路径 * @param newFile 写入目标文件路径 * @param from 起始页码 * @param end 结束页码 */public static void pdfToSub(String filePath, String newFile, int from, int end) {Document document = null;Pdfcopy copy = null;try {PdfReader reader = new PdfReader(filePath);//查询pdf文档页数int n = reader.getNumberOfPages();if (end == 0) {end = n;}document = new Document(reader.getPageSize(1));copy = new PdfCopy(document, new FileOutputStream(newFile));document.open();for (int j = from; j <= end; j ) {document.newPage();PdfImportedPage page = copy.getImportedPage(reader, j);copy.addPage(page);}document.close();} catch (Exception e) {e.printStackTrace();}} main函数： String filePath = "/Users/xxxx/Downloads/数据中台- 77ebooks.com.pdf";String newFile = "/Users/xxxx/Downloads/1-3.pdf";pdfToSub(filePath, newFile, 1, 3); 执行后在目录下可以看到结果文件：

读取pdf文件内容

使用pdfbox的pdfparser，代码如下： /** * 读取pdf文档指定页数的文本内容 * @param fileName 文件路径及文件名 * @param from 开始页码 * @param end 结束页码 * @return */public static String readPdfByPage(String fileName, int from, int end) {String result = "";File file = new File(fileName);FileInputStream in = null;try {in = new FileInputStream(fileName);// 新建PDF解析器对象PDFParser parser = new PDFParser(new RandomAccessFile(file,"rw"));// 文件解析parser.parse();// 获取解析后得到的PDF文档对象PDDocument pdfDocument = parser.getPDDocument();int size = pdfDocument.getNumberOfPages();// 新建PDF文本剥离器PDFTextStripper stripper = new PDFTextStripper();stripper.setSortByPosition(false); //true则按照行进行读取，默认false// 设置起始页stripper.setStartPage(from);// 设置结束页stripper.setEndPage(end);// 从PDF文档中读取文本result = stripper.getText(pdfDocument);} catch (Exception e) {e.printStackTrace();} finally {if (in != null) {try {in.close();} catch (IOException e1) {e1.printStackTrace();}}}return result;} 执行后输出：

示例需要需要引入的package如下： import com.itextpdf.text.Document;import com.itextpdf.text.pdf.PdfCopy;import com.itextpdf.text.pdf.PdfImportedPage;import com.itextpdf.text.pdf.PdfReader;import org.apache.pdfbox.io.RandomAccessFile;import org.apache.pdfbox.pdfparser.PDFParser;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.text.PDFTextStripper;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.IOException;小结本篇对pdf的操作做了初步的尝试。后续将详解pdf文件格式，以及pdfbox和itextpdf的核心源码。

本文分类：pdf百科
本文标签：pdf阅读器 pdf分页
浏览次数：204 次浏览
发布日期：2023-04-12 18:46:24
本文链接：http://chaxun188.com/archives/pdfbaike/256.html

上一篇 > 金山PDF专业版一键直装
下一页 > pdf怎么转换成word不花钱

五分钟实现pdf分页

pdf文件怎么放大

PDF怎么注释

pdf文件怎么旋转

pdf格式如何转换成word文档格式文件

找回密码

确认删除所有文件?