This tutorial shows you how to work with the JavaScript features in Acrobat 9. See what the all-new Acrobat DC can do for you.
Download a free trial of the new Acrobat.
Scope: Acrobat 9.0 and later
Category: Automation
Skill Level: Intermediate and Advanced
Prerequisites: Acrobat JavaScript Programming Experience
Redaction, or selectively removing content from a PDF, is an important part of distributing PDFs that contain sensitive information. In Acrobat, redaction was added as a standard tool. With the release of Acrobat 9, redaction is available for automation in JavaScript. This means you can create a one-click or batch-redaction solution. But, there is some work that needs to be done to set this up.
Redaction is a three-step process.
These steps do not need to be done all at once. For example, the document could be marked for redaction and then sent to other users for review. The actual redaction could be done after the review is complete. The document could even be comment-enabled so the reviewers can use Adobe Reader. While redaction cannot be applied to a Reader-enabled PDF, the PDF can be saved to a copy, and the redaction applied to the copy.
The Acrobat user interface provides both a manual process and a search process for marking and redacting content. Automating redaction with Acrobat JavaScript extends these features by adding the ability to create custom redaction scripts and to mix redaction in with scripts that do other tasks. Automation saves time and provides greater flexibility in how redaction can be applied to a document.
Because redaction is a major document manipulation, it cannot be done in Reader, and the JavaScript function requires a privileged (or trusted) context. So this is very much an application-level automation operation. It cannot be done from a document script.
The script can be run from a simple copy-and-paste into the JavaScript Console. It can be placed in a Batch Process to operate on multiple documents. Or it can be run from an Acrobat toolbar button created in a Folder Level Script for a one-click solution. For demonstration purposes, all the examples in this article will be run from the Acrobat JavaScript Console, which is a privileged context and also very handy for running quick copy-and-paste automation scripts.
I’ve made up an example file for testing. Download this file and save it to a local folder on your system:
Example File: MyWidgets_NewEmployee_w4s.pdf
This file is a list of W4 forms for new employees. It’s created by the HR department and will be distributed to various other departments. Because the file contains sensitive information, parts of the W4 need to be redacted before being distributed. For this purpose, two redaction schemes are needed. In scheme #1, the employees’ names and addresses are redacted, and in scheme #2, the employees’ social security numbers and exemptions are redacted.
Part 1: Identifying things to redact
In general, there are two different approaches to identifying a particular piece of page content- absolute page position and searching the text. It’s also common to mix the two.
Custom search
Here’s the standard loop used in a custom text search. The input to the search is a regular expression.
// Define Regular Expression for search var rgExp = /\d{3}-\d{2}-\d{4}/; // Matches Soc. Sec. Num. // Loop over document pages for(var pg=0;pg<this.numPages;pg++) { var len = getPageNumWords(pageNum) } // Loop over words on page for(var i=0;i<len;i++) { var wd = this.getPageNthWord(pageNum,i,false) if(rg.test(wd)) { ... Apply operation to selected word ... } } }
This script tests single words pulled from the page content, but could be modified to build up more complex strings for testing. However, words are not necessarily pulled from the page in the order they appear, so you have to be careful.
Absolute position
For the example defined above, we don’t need to write a search. The W4 form has a very consistent geometry, so we can assume a specific piece of content is always at the same location. All we have to do is find the locations of the content of interest.
To add a Redact Annotation, we need what’s called a quad. A quad is an array of eight numbers. Each pair of numbers is a point, so a quad is four points. These are the coordinates in “Default User Space” of the corners of a rectangle drawn around the content. Using a quad instead of a rectangle structure handles the situation where text is rotated.
So, our task is to find the coordinates of the space we want to redact, and then convert it to a quad. We can do this by manually placing another annotation type over the area. The rectangle annot is the best choice (Figure 1). Then, run the following code in console window to print out the quad for that location:
var rct = getAnnots(this.pageNum)[0].rect; var left = rct[0]; var right = rct[2]; var top = rct[3]; var bot = rct[1]; qd = [ [left, top, right, top, left, bot, right, bot] ]; qd.toSource();
Now copy the quad into the code that will become the redaction script. Move and resize the annotation, then re-run the code and copy the new quad into your code. Repeat this operation until all redaction areas have been covered.
Figure 1: JavaScript for finding a Quad using a Rectangle Annotation
Notice that the quad numbers in Figure 1 are all rounded. I use the Math.round() function (not shown in the code) to make the code look a lot neater, but it’s not necessary.
Part 2: Marking content for redaction
Marking an area for redaction is done with the Redact Annotation. This is a new annotation type added in Acrobat 9. Its only purpose is to mark page areas for redaction. From JavaScript, this annotation is added with the following line of code:
this.addAnnot({ type:"Redact", page: pgNum, quads: qd, overlayText: ":It's gone:", alignment: 1, // Center alignment repeat:true });
At a minimum, the only parameters that need to be specified are the annotation type, the page number and the quads array. Additional parameters are the overlay text, the alignment of the overlay text, and a true/false value that repeats the overlay text on the annotation. The overlay text is used by the redaction function to replace the content removed from the page.
In our example, there are two different types of redaction. The best way to approach this situation is to build a function that works for both. The only difference between the two redactions is the areas they affect, so our input to the function should be a list of the areas to redact. Here’s the code:
Function AddMyRedacts(quadList) { for(var pg=1; pg<this.numPages; pg++) { for(var index=0; index<quadList; index++) { this.addAnnot({ type:"Redact", page:pg, quads:quadList[index] }); } } }
This code walks through the document pages and adds Redact Annotations. One annotation is added for each quad in the list. The first page of the file is a title page, so it’s not included in the redaction.
Part 3: Applying redaction
Redactions are applied to a PDF with the doc.applyRedactions() function.
this.applyRedactions ({ aMarks: myMarks, bKeepMarks: false, bShowConfirmation: false, cProgText: "It’s going away" });
If no input parameters are specified, then all the Redact Annotations in the PDF are used. The first input parameter aMarks is an array of Redact Annotations to apply to the PDF. The second parameter bKeepMarks instructs the function to keep the Redact Annotations; otherwise they are deleted. If the third parameter bShowConfirmation is set to true, then a popup dialog is displayed when redaction is complete. The fourth parameter cProgText is text displayed on the progress bar as the redaction is taking place. Redaction can take a very long time.
Now we have all the parts needed to make this work. We have the quads for all the areas that need redacting, a function for applying the Redact Annotation, and a function for applying redactions. Here’s the complete script:
// Part 1: Quads for the different Redaction areas var qdFirstName = [[0, 195, 179, 195, 0, 181, 179, 181]]; var qdLastName = [[182, 195, 397, 195, 182, 181, 397, 181]]; var qdAdress = [[-1, 172, 273, 172, -1, 158, 273, 158]]; var qdCityState = [[-1, 172, 273, 172, -1, 158, 273, 158]]; var qdSocSec = [[400, 193, 527, 193, 400, 179, 527, 179]]; var qdAlow = [[471, 132, 529, 132, 471, 109, 529, 109]]; var qdExempt = [[413, 72, 529, 72, 413, 60, 529, 60]]; // Part 2: Function for adding Redact Annotations to a PDF function AddMyRedacts(quadList) { for(var pg=1; pg<this.numPages; pg++) { for(var index=0; index<quadList.length; index++) { this.addAnnot({ type:"Redact", page:pg, quads:quadList[index], overlayText: ":It's gone:", alignment: 1, // center alignment repeat:true }); } } } // Part3: Applying the redactions // Code for two different redaction schemes // Use only one per document // #1 Run this code to apply redactions to W4 employee name // and address AddMyRedacts([qdFirstName, qdLastName, qdAdress, qdCityState, qdSocSec] ); this.applyRedactions({bKeepMarks: false, bShowConfirmation: true, cProgText: "It's going away" }); // #2 Run this code to apply redactions to W4 exemption // information AddMyRedacts( [qdAlow, qdExempt, qdSocSec] ); this.applyRedactions({ bKeepMarks: false, bShowConfirmation: true, cProgText: "It's going away" });
We’ve already covered the first two parts of the code which are both set-up operations. The third part is where the redaction is really done. It includes sections of code for applying both redaction schemes outlined earlier. Each section of code is identical except for the list of quads passed into the AddMyRedacts() function. You can modify how the scripts work by changing this list.
To use this code, copy it to the Acrobat JavaScript Console. Select and run Parts #1 and #2. Then select the code for the scheme that will be used on the current document and run it. Run only one scheme on one document.
The same code presented here could be used verbatim in a Batch Script.
A more interesting and useful way to run an automation script is with an Acrobat toolbar button or menu item. However, using one of these options requires that the code be enclosed in a trusted function. Code for creating toolbar buttons and trusted functions can be found in this article, Applying PDF security with Acrobat JavaScript.
For more information on functions used in this article, see the Acrobat JavaScript Reference and the Acrobat JavaScript Guide.
http://www.adobe.com/devnet/acrobat/
Click on the Documentation tab and scroll down to the JavaScript section.
Products covered: |
Acrobat 9 |
Related topics: |
Protect PDFs, JavaScript |
Top Searches: |
Apply PDF passwords and permissions get electronic signatures from others sign and send documentsEdit PDF create PDF Action Wizard |
Try Acrobat DC
Get started >
Learn how to
edit PDF.
Post, discuss and be part of the Acrobat community.
Join now >
7 comments
Comments for this tutorial are now closed.
Lori Kassuba
6, 2014-03-05 05, 2014Hi John Bullas and Rahoul,
Can you post your questions to our Experts at:
http://answers.acrobatusers.com/AskQuestion.aspx
Be sure to select the JavaScript category.
Thanks,
Lori
John Bullas
9, 2014-03-02 02, 2014Good Morning
I have a simple request to identify the javascript action to delete any pages in a given pdf that have a particular text string in them (not redaction just deletion). I would say it needs to
open file
save file as filename+(_stripped).pdf
get string input
for page number 0 to numpages/
if find(string) in page then delete page
next page
end
save out file filename+(_stripped).pdf
but sadly as I am a road safety engineer not a programmer… thus the problem
happy to edit the code each time to put a new string in as it is processing (infrequently) 1500+ page PDFs of scanned flat files
Kind regards
Dr B
Southampton UK
Rahoul
9, 2013-09-23 23, 2013this.addAnnot({
type:“Redact”,
page: 1,
quads: [[-2.1176605224609375, 169.62046813964844, 208.88040161132812, 169.62046813964844, -2.1176605224609375, 157.42945861816406, 208.88040161132812, 157.42945861816406]],
overlayText: “:It’s gone:”,
alignment: 1, // Center alignment
repeat:true
});
Lori Kassuba
4, 2013-08-28 28, 2013Hi Ulysses,
In Acrobat XI, the Find command now has the ability to Replace text. Simply type in the word you’re looking for and then fill in the “Replace with” box to change the text.
Thanks,
Lori
Ulysses Brazil
5, 2013-08-23 23, 2013Hi, I have used a XI version and I`d like to find/replace or find/delete a text of my PDF with Acrobat Assistent. Can you help me?
T M
4, 2013-07-20 20, 2013Hi, Can this be used for XI?
Thom Parker
11, 2013-03-18 18, 2013The last line is actually two lines. It should be:
qd = [ [left, top, right, top, left, bot, right, bot] ];
qd.toSource();
Alex
9, 2013-03-15 15, 2013Thanks for this article; it helped me a lot!
I can’ get the quad extraction code to work though. (...getAnnots(this.pageNum)[0].rect;...)
I get the error
SyntaxError: missing ; before statement
1:Console:Exec
undefined
I was able to get the coordinates from exporting the comment to fdf, but that’s clunky… is there a simple fix to your code?
Comments for this tutorial are now closed.