Tuesday, November 25, 2014

Get PDF fonts information line by line using PDFBox API

Sample code snippet on extracting font information line by line using PDFBox API in JAVA.

 public String[] getFontLineByLineFromPdf(String fileName)throws IOException  
   {  
     PDDocument doc= PDDocument.load(fileName);  
     PDFTextStripper stripper = new PDFTextStripper() {  
       String prevBaseFont = "";  
       protected void writeString(String text, List<TextPosition> textPositions) throws IOException  
       {  
         StringBuilder builder = new StringBuilder();  
         for (TextPosition position : textPositions)  
         {  
           String baseFont = position.getFont().getBaseFont();  
           if (baseFont != null && !baseFont.equals(prevBaseFont))  
           {  
             builder.append('[').append(baseFont).append(']');  
             prevBaseFont = baseFont;  
           }  
           builder.append(position.getCharacter());  
         }  
         writeString(builder.toString());  
       }  
     };  
     String content=stripper.getText(doc);  
     doc.close();  
     String pdfLinesWithFont[]= content.split("\\r?\\n");  
     return pdfLinesWithFont;  
   }  

java

2 comments :

 

© 2011 GIS and Remote Sensing Tools, Tips and more .. ToS | Privacy Policy | Sitemap

About Me