Preview · AppleScripts

Pdf To Text

    An AppleScript to quickly extract text from the frontmost pdf document opened in Apple's Preview and open it with Barebone's TextWrangler. The main advantage of this method is that the format option of PdfToText is used which allows you to extract e.g. tables without loosing their formatting. That way they can easily be processed with a text editor and imported into a spreadsheet.

Screenshot of Pdf To Text

Note: PdfToText is part of XPDF and needs to be installed seperately. The easiest way is using a package manager like MacPorts, Fink or Homebrew. There is also an easy-to-use installer package by Carsten Blüm. For more information on PdfToText take a look at the man page.

The extracted text is opened with Barebone's TextWrangler which is a free application that needs to be installed seperately. 'edit' is a command line tool that ships with TextWrangler. Install it via the Textwrangler menu → Install Command Line Tools...

UI scripting needs to be enabled in order for this script to work.

Save the script to '~/Library/Scripts/Applications/Preview/' and run it via the AppleScript Menu or use a third-party application like FastScripts to easily assign a keyboard shortcut to the script.


show source AppleScript Editor open download
--   Creation date:    Sonntag, 1. November 2009, 16:01:43
--   Created by:        ljr_nbg (http://applescript.bratis-lover.net)
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
--c-                                                                                                     SETTINGS
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

property myName : "Pdf To Text"
property myVersion : "1.0"

--c-- path to pdftotext
property pdftotext : "/opt/local/bin/pdftotext"


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
--c-                                                                                               DESCRIPTION
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

(*
An AppleScript to quickly extract text from the frontmost pdf document opened in 
Apple's Preview and open it with Barebone's 
[TextWrangler](http://www.barebones.com/products/textwrangler/). The main 
advantage of this method is that the format option of PdfToText is used which
allows you to extract e.g. tables without loosing their formatting. That way they 
can easily be processed with a text editor and imported into a spreadsheet.

**Note:** PdfToText is part of [XPDF](http://www.foolabs.com/xpdf/) and needs 
to be installed seperately. The easiest way is to use a package manager like 
[MacPorts](http://www.macports.org), [Fink](http://www.finkproject.org) or 
[Homebrew](https://github.com/mxcl/homebrew). For more information on 
PdfToText take a look at the [man page](http://linux.die.net/man/1/pdftotext).

The extracted text is opened with Barebone's 
[TextWrangler](http://www.barebones.com/products/textwrangler/) which is a 
free application that needs to be installed seperately. 'edit' is a command line tool 
that ships with TextWrangler. Install it via the Textwrangler menu --> Install 
Command Line Tools...

UI scripting needs to be enabled in order for this script to work.

WWW: http://applescript.bratis-lover.net/applescripts/preview/pdf-to-text/
*)

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
--c-                                                                                                            MAIN
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

--c-- get local file url from Preview's front document
try
        tell application "System Events"
                tell process "Preview"
                        set docPath to value of attribute ¬
                                "AXDocument" of window 1 as string
                end tell
        end tell
on error
        my errorHandler("Script cannot be executed!", ¬
                "No open Preview document.", 1)
end try

--c-- decode url
set docPath to (do shell script "echo " & quoted form of docPath & ¬
        " | perl -MURI::Escape -lne 'print uri_unescape($_)'")
set docPath to POSIX file docPath as string
set docPath to POSIX path of docPath

--c-- prompt for page range
tell application "Preview"
        set pageRange to text returned of (display dialog "Pdf-File:\r" & docPath & ¬
                "\r\rEnter page range or page number:\r(e.g. 3-8 or 5)" with title myName ¬
                default answer "" with icon 1)
end tell

--c-- process page range
try
        if pageRange contains "-" then
                set pageRange to _string's explode("-", pageRange)
                set startPage to (_string's trimBoth(pageRange's item 1)) as integer
                set endPage to (_string's trimBoth(pageRange's item 2)) as integer
        else
                set startPage to (_string's trimBoth(pageRange)) as integer
                set endPage to (_string's trimBoth(pageRange)) as integer
        end if
on error eMsg number eNum
        my errorHandler("Problems with your input!", eMsg, eNum)
end try

--c-- run pdftotext and pipe output to TextWrangler
do shell script (pdftotext & " -f " & startPage & " -l " & endPage & ¬
        " -enc UTF-8 -layout " & quoted form of docPath & " - | edit --clean --view-top -t \"" & ¬
        docPath & " · Pages " & startPage & " to " & endPage & "\"")


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
--c-                                                                                                  FUNCTIONS
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

on errorHandler(customMsg, eMsg, eNum)
        tell application "Preview" to display dialog customMsg & "\r\rError: " & ¬
                eNum default answer eMsg buttons {"Cancel"} default button 1 ¬
                cancel button 1 with icon 0 ¬
                with title myName giving up after 7
        error number -128
end errorHandler

script _string
        
        --c--   explode(delimiter, input)
        --d--   Split a string on a specific delimiter.
        --a--   separator : string -- the delimiter used to split the string
        --a--   input : string -- the string to split
        --r--   list
        --x--   explode("-", "a-b-c") --> {"a", "b", "c"}
        --u--   ljr (http://applescript.bratis-lover.net/library/string/) 
        --u--   (modified from 'Applescript.net' (http://bbs.applescript.net/viewtopic.php?id=18377))
        on explode(delimiter, input)
                local delimiter, input, ASTID
                set ASTID to AppleScript's text item delimiters
                try
                        set AppleScript's text item delimiters to delimiter
                        set input to text items of input
                        set AppleScript's text item delimiters to ASTID
                        return input --> list
                on error eMsg number eNum
                        set AppleScript's text item delimiters to ASTID
                        error "Can't explode: " & eMsg number eNum
                end try
        end explode
        
        
        --c--   trimBoth(str)
        --d--   Trim any white space (space/tab/return/linefeed) from both ends of a string.
        --a--   str : string
        --r--   string
        --x--   trimBoth("  \t \n \r  abc  \t \n \r ") --> "abc"
        --q--   trimStart, trimEnd
        --u--   HAS (http://applemods.sourceforge.net/mods/Data/String.php)
        on trimBoth(str)
                local str
                try
                        return my trimStart(my trimEnd(str))
                on error eMsg number eNum
                        error "Can't trimBoth: " & eMsg number eNum
                end try
        end trimBoth
        
        
        --c--   trimStart(str)
        --d--   Trim any white space (space/tab/return/linefeed) from the start of a string.
        --a--   str : string
        --r--   string
        --x--   trimStart("  \t \n \r abc") --> "abc"
        --u--   HAS (http://applemods.sourceforge.net/mods/Data/String.php)
        on trimStart(str)
                local str, whiteSpace
                try
                        set str to str as string
                        set whiteSpace to {character id 10, return, space, tab}
                        try
                                repeat while str's first character is in whiteSpace
                                        set str to str's text 2 thru -1
                                end repeat
                                return str
                        on error number -1728
                                return ""
                        end try
                on error eMsg number eNum
                        error "Can't trimStart: " & eMsg number eNum
                end try
        end trimStart
        
        
        --c--   trimEnd(str)
        --d--   Trim any white space (space/tab/return/linefeed) from the end of a string.
        --a--   str : string
        --r--   string
        --x--   trimEnd("abc  \t \n \r ") --> "abc"
        --u--   HAS (http://applemods.sourceforge.net/mods/Data/String.php)
        on trimEnd(str)
                local str, whiteSpace
                try
                        set str to str as string
                        set whiteSpace to {character id 10, return, space, tab}
                        try
                                repeat while str's last character is in whiteSpace
                                        set str to str's text 1 thru -2
                                end repeat
                                return str
                        on error number -1728
                                return ""
                        end try
                on error eMsg number eNum
                        error "Can't trimEnd: " & eMsg number eNum
                end try
        end trimEnd
        
end script


-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
--c-                                                                            TERMS OF USE & CREDITS
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

(*
This script was written by ljr_nbg (http://applescript.bratis-lover.net).

It uses handlers from the appleMods projekt:
(c) 2003 HAS (http://applemods.sourceforge.net) 

It is released under the same terms as appleMods:

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons
to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies
or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*)