This tutorial shows you how to work with the JavaScript features in Acrobat 9. See what the all-new Acrobat DC can do for you.

Download a free trial of the new Acrobat.

Automating redaction with Acrobat JavaScript and Acrobat 9

Learn how to automate redaction to remove sensitive information from a PDF.

By Thom Parker – July 17, 2008

 

Scope: Acrobat 9.0 and later
Category: Automation
Skill Level: Intermediate and Advanced
Prerequisites: Acrobat JavaScript Programming Experience

Redaction, or selectively removing content from a PDF, is an important part of distributing PDFs that contain sensitive information. In Acrobat, redaction was added as a standard tool. With the release of Acrobat 9, redaction is available for automation in JavaScript. This means you can create a one-click or batch-redaction solution. But, there is some work that needs to be done to set this up.

About redaction

Redaction is a three-step process.

  1. Identify words or graphics to be redacted (removed).
  2. Mark the words or areas to be redacted with the Redact Annotation.
  3. Initiate the redaction process.

These steps do not need to be done all at once. For example, the document could be marked for redaction and then sent to other users for review. The actual redaction could be done after the review is complete. The document could even be comment-enabled so the reviewers can use Adobe Reader. While redaction cannot be applied to a Reader-enabled PDF, the PDF can be saved to a copy, and the redaction applied to the copy.

The Acrobat user interface provides both a manual process and a search process for marking and redacting content. Automating redaction with Acrobat JavaScript extends these features by adding the ability to create custom redaction scripts and to mix redaction in with scripts that do other tasks. Automation saves time and provides greater flexibility in how redaction can be applied to a document.

Because redaction is a major document manipulation, it cannot be done in Reader, and the JavaScript function requires a privileged (or trusted) context. So this is very much an application-level automation operation. It cannot be done from a document script.

The script can be run from a simple copy-and-paste into the JavaScript Console. It can be placed in a Batch Process to operate on multiple documents. Or it can be run from an Acrobat toolbar button created in a Folder Level Script for a one-click solution. For demonstration purposes, all the examples in this article will be run from the Acrobat JavaScript Console, which is a privileged context and also very handy for running quick copy-and-paste automation scripts.

Usage example

I’ve made up an example file for testing. Download this file and save it to a local folder on your system:

Example File: MyWidgets_NewEmployee_w4s.pdf

This file is a list of W4 forms for new employees. It’s created by the HR department and will be distributed to various other departments. Because the file contains sensitive information, parts of the W4 need to be redacted before being distributed. For this purpose, two redaction schemes are needed. In scheme #1, the employees’ names and addresses are redacted, and in scheme #2, the employees’ social security numbers and exemptions are redacted.

Part 1: Identifying things to redact

In general, there are two different approaches to identifying a particular piece of page content- absolute page position and searching the text. It’s also common to mix the two.

Custom search

Here’s the standard loop used in a custom text search. The input to the search is a regular expression.

// Define Regular Expression for search
var rgExp = /\d{3}-\d{2}-\d{4}/;
// Matches Soc. Sec. Num.
// Loop over document pages
for(var pg=0;pg<this.numPages;pg++) { var len = getPageNumWords(pageNum) }
// Loop over words on page
for(var i=0;i<len;i++) {
	var wd = this.getPageNthWord(pageNum,i,false)
	if(rg.test(wd)) { ... Apply operation to selected word ... }
	}
}

This script tests single words pulled from the page content, but could be modified to build up more complex strings for testing. However, words are not necessarily pulled from the page in the order they appear, so you have to be careful.

Absolute position

For the example defined above, we don’t need to write a search. The W4 form has a very consistent geometry, so we can assume a specific piece of content is always at the same location. All we have to do is find the locations of the content of interest.

To add a Redact Annotation, we need what’s called a quad. A quad is an array of eight numbers. Each pair of numbers is a point, so a quad is four points. These are the coordinates in “Default User Space” of the corners of a rectangle drawn around the content. Using a quad instead of a rectangle structure handles the situation where text is rotated.

So, our task is to find the coordinates of the space we want to redact, and then convert it to a quad. We can do this by manually placing another annotation type over the area. The rectangle annot is the best choice (Figure 1). Then, run the following code in console window to print out the quad for that location:

var rct = getAnnots(this.pageNum)[0].rect;
var left = rct[0];
var right = rct[2];
var top = rct[3];
var bot = rct[1];
qd = [ [left, top, right, top, left, bot, right, bot] ];
qd.toSource();

Now copy the quad into the code that will become the redaction script. Move and resize the annotation, then re-run the code and copy the new quad into your code. Repeat this operation until all redaction areas have been covered.

Figure 1

Figure 1: JavaScript for finding a Quad using a Rectangle Annotation

Notice that the quad numbers in Figure 1 are all rounded. I use the Math.round() function (not shown in the code) to make the code look a lot neater, but it’s not necessary.

Part 2: Marking content for redaction

Marking an area for redaction is done with the Redact Annotation. This is a new annotation type added in Acrobat 9. Its only purpose is to mark page areas for redaction. From JavaScript, this annotation is added with the following line of code:

this.addAnnot({
	type:"Redact",
	page: pgNum,
	quads: qd,
	overlayText: ":It's gone:",
	alignment: 1, // Center alignment
	repeat:true
});

At a minimum, the only parameters that need to be specified are the annotation type, the page number and the quads array. Additional parameters are the overlay text, the alignment of the overlay text, and a true/false value that repeats the overlay text on the annotation. The overlay text is used by the redaction function to replace the content removed from the page.

In our example, there are two different types of redaction. The best way to approach this situation is to build a function that works for both. The only difference between the two redactions is the areas they affect, so our input to the function should be a list of the areas to redact. Here’s the code:

Function AddMyRedacts(quadList) {
	for(var pg=1; pg<this.numPages; pg++)
	{
		for(var index=0; index<quadList; index++) {
			this.addAnnot({
				type:"Redact",
				page:pg, quads:quadList[index]
			});
		}
	}
}

This code walks through the document pages and adds Redact Annotations. One annotation is added for each quad in the list. The first page of the file is a title page, so it’s not included in the redaction.

Part 3: Applying redaction
Redactions are applied to a PDF with the doc.applyRedactions() function.

this.applyRedactions ({
	aMarks: myMarks,
	bKeepMarks: false,
	bShowConfirmation: false,
	cProgText: "It’s going away"
});

If no input parameters are specified, then all the Redact Annotations in the PDF are used. The first input parameter aMarks is an array of Redact Annotations to apply to the PDF. The second parameter bKeepMarks instructs the function to keep the Redact Annotations; otherwise they are deleted. If the third parameter bShowConfirmation is set to true, then a popup dialog is displayed when redaction is complete. The fourth parameter cProgText is text displayed on the progress bar as the redaction is taking place. Redaction can take a very long time.

Putting it all together

Now we have all the parts needed to make this work. We have the quads for all the areas that need redacting, a function for applying the Redact Annotation, and a function for applying redactions. Here’s the complete script:

// Part 1: Quads for the different Redaction areas
var qdFirstName = [[0, 195, 179, 195, 0, 181, 179, 181]];
var qdLastName = [[182, 195, 397, 195, 182, 181, 397, 181]];
var qdAdress = [[-1, 172, 273, 172, -1, 158, 273, 158]];
var qdCityState = [[-1, 172, 273, 172, -1, 158, 273, 158]];
var qdSocSec = [[400, 193, 527, 193, 400, 179, 527, 179]];
var qdAlow = [[471, 132, 529, 132, 471, 109, 529, 109]];
var qdExempt = [[413, 72, 529, 72, 413, 60, 529, 60]];
// Part 2: Function for adding Redact Annotations to a PDF function
AddMyRedacts(quadList) {
	for(var pg=1; pg<this.numPages; pg++) {
		for(var index=0; index<quadList.length; index++) {
			this.addAnnot({
				type:"Redact", page:pg,
				quads:quadList[index],
				overlayText: ":It's gone:",
				alignment: 1, // center alignment
				repeat:true });
			}
		}
	}
// Part3: Applying the redactions
// Code for two different redaction schemes
// Use only one per document
// #1 Run this code to apply redactions to W4 employee name
// and address
AddMyRedacts([qdFirstName, qdLastName, qdAdress, qdCityState, qdSocSec] );
this.applyRedactions({bKeepMarks: false, bShowConfirmation: true, cProgText: "It's going away" });
// #2 Run this code to apply redactions to W4 exemption
// information 
AddMyRedacts( [qdAlow, qdExempt, qdSocSec] );
this.applyRedactions({ bKeepMarks: false, bShowConfirmation: true, cProgText: "It's going away" });

We’ve already covered the first two parts of the code which are both set-up operations. The third part is where the redaction is really done. It includes sections of code for applying both redaction schemes outlined earlier. Each section of code is identical except for the list of quads passed into the AddMyRedacts() function. You can modify how the scripts work by changing this list.

To use this code, copy it to the Acrobat JavaScript Console. Select and run Parts #1 and #2. Then select the code for the scheme that will be used on the current document and run it. Run only one scheme on one document.

Automation alternatives

The same code presented here could be used verbatim in a Batch Script.

A more interesting and useful way to run an automation script is with an Acrobat toolbar button or menu item. However, using one of these options requires that the code be enclosed in a trusted function. Code for creating toolbar buttons and trusted functions can be found in this article, Applying PDF security with Acrobat JavaScript.

For more information on functions used in this article, see the Acrobat JavaScript Reference and the Acrobat JavaScript Guide.

http://www.adobe.com/devnet/acrobat/

Click on the Documentation tab and scroll down to the JavaScript section.



Products covered:

Acrobat 9

Related topics:

Protect PDFs, JavaScript

Top Searches:


7 comments

Comments for this tutorial are now closed.

Lori Kassuba

6, 2014-03-05 05, 2014

Hi John Bullas and Rahoul,

Can you post your questions to our Experts at:
http://answers.acrobatusers.com/AskQuestion.aspx
Be sure to select the JavaScript category.

Thanks,
Lori

John Bullas

9, 2014-03-02 02, 2014

Good Morning

I have a simple request to identify the javascript action to delete any pages in a given pdf that have a particular text string in them (not redaction just deletion). I would say it needs to

open file
save file as filename+(_stripped).pdf
get string input
for page number 0 to numpages/
if find(string) in page then delete page
next page
end
save out file filename+(_stripped).pdf

but sadly as I am a road safety engineer not a programmer… thus the problem

happy to edit the code each time to put a new string in as it is processing (infrequently) 1500+ page PDFs of scanned flat files

Kind regards

Dr B
Southampton UK

Rahoul

9, 2013-09-23 23, 2013

this.addAnnot({
type:“Redact”,
page: 1,
quads: [[-2.1176605224609375, 169.62046813964844, 208.88040161132812, 169.62046813964844, -2.1176605224609375, 157.42945861816406, 208.88040161132812, 157.42945861816406]],
overlayText: “:It’s gone:”,
alignment: 1, // Center alignment
repeat:true
});

Lori Kassuba

4, 2013-08-28 28, 2013

Hi Ulysses,

In Acrobat XI, the Find command now has the ability to Replace text. Simply type in the word you’re looking for and then fill in the “Replace with” box to change the text.

Thanks,
Lori

Ulysses Brazil

5, 2013-08-23 23, 2013

Hi, I have used a XI version and I`d like to find/replace or find/delete a text of my PDF with Acrobat Assistent. Can you help me?

T M

4, 2013-07-20 20, 2013

Hi, Can this be used for XI?

Thom Parker

11, 2013-03-18 18, 2013

The last line is actually two lines. It should be:

qd = [ [left, top, right, top, left, bot, right, bot] ];
qd.toSource();

Alex

9, 2013-03-15 15, 2013

Thanks for this article; it helped me a lot!

I can’ get the quad extraction code to work though. (...getAnnots(this.pageNum)[0].rect;...)

I get the error
SyntaxError: missing ; before statement
1:Console:Exec
undefined

I was able to get the coordinates from exporting the comment to fdf, but that’s clunky… is there a simple fix to your code?

Comments for this tutorial are now closed.