1 % Copyright (C) 1991, 1995, 1996, 1998, 1999 Aladdin Enterprises. All rights reserved.
3 % This software is provided AS-IS with no warranty, either express or
6 % This software is distributed under license and may not be copied,
7 % modified or distributed except as expressly authorized under the terms
8 % of the license contained in the file LICENSE in this distribution.
10 % For more information about licensing, please refer to
11 % http://www.ghostscript.com/licensing/. For information on
12 % commercial licensing, go to http://www.artifex.com/licensing/ or
13 % contact Artifex Software, Inc., 101 Lucas Valley Road #110,
14 % San Rafael, CA 94903, U.S.A., +1(415)492-9861.
16 % $Id: ps2ascii.ps,v 1.10 2004/06/23 09:04:17 igor Exp $
17 % Extract the ASCII text from a PostScript file. Nothing is displayed.
18 % Instead, ASCII information is written to stdout. The idea is similar to
19 % Glenn Reid's `distillery', only a lot more simple-minded, and less robust.
21 % If SIMPLE is defined, just the text is written, with a guess at line
22 % breaks and word spacing. If SIMPLE is not defined, lines are written
23 % to stdout as follows:
25 % F <height> <width> (<fontname>)
26 % Indicate the font height and the width of a space.
29 % Indicate the end of the page.
31 % S <x> <y> (<string>) <width>
34 % <width> and <height> are integer dimensions in units of 1/720".
35 % <x> and <y> are integer coordinates, in units of 1/720", with the origin
37 % <string> and <fontname> are strings represented with the standard
38 % PostScript escape conventions.
40 % If COMPLEX is defined, the following additional types of lines are
44 % Indicate the current color.
46 % I <x> <y> <width> <height>
47 % Note the presence of an image.
49 % R <x> <y> <width> <height>
52 % <r>, <g>, and <b> are RGB values expressed as integers between 0 and 1000.
54 % Note that future versions of this program (in COMPLEX mode) may add
55 % other output elements, so programs parsing the output should be
56 % prepared to ignore elements that they do not recognize.
58 % Note that this code will only work in all cases if systemdict is writable
59 % and if `binding' the definitions of operators defined as procedures
60 % is deferred. For this reason, it is normally invoked with
61 % gs -q -dNODISPLAY -dDELAYBIND -dWRITESYSTEMDICT ps2ascii.ps
64 % J Greely <jgreely@cis.ohio-state.edu> for improvements to this code;
65 % Jerry Whelan <jerryw@abode.ccd.bnl.gov> for motivating other improvements;
66 % David M. Jones <dmjones@theory.lcs.mit.edu> for improvements noted below.
68 %% Additional modifications by David M. Jones
69 %% (dmjones@theory.lcs.mit.edu), December 23, 1997
71 %% (a) Rewrote forall loop at the end of .show.write. This fixes a
72 %% stack leakage problem, but the changes are more significant
75 %% .char.map includes the names of all characters in the
76 %% StandardEncoding, ISOLatin1Encoding, OT1Encoding and
77 %% T1Encoding vectors. Thus, if the Encoding vector for the
78 %% current font contains a name that is not in .char.map, it's
79 %% redundant to check if the Encoding vector is equal to one of
80 %% the known vectors. Previous versions of ps2ascii would give
81 %% up at this point, and substitute an asterisk (*) for the
82 %% character. I've taken the liberty of instead using the
83 %% OT1Encoding vector to translate the character, on the grounds
84 %% that in the cases I'm most interested in, a font without a
85 %% useful Encoding vector was most likely created by a DVI to PS
86 %% converter such as dvips or DVILASER (and OT1Encoding is
87 %% largely compatible with StandardEncoding anyway). [Note that
88 %% this does not make my earlier changes to support dvips (see
89 %% fix (a) under my 1996 changes) completely obsolete, since
90 %% there's additional useful information I can extract in that
93 %% Overall, this should provide better support for some documents
94 %% (e.g, DVILASER documents will no longer be translated into a
95 %% series of *'s) without breaking any other documents any worse
96 %% than they already were broken.
98 %% (b) Fixed two bugs in dvips.df-tail: (1) changed "dup 127" to "dup
99 %% 128" to fix fencepost error, and (2) gave each font it's own
100 %% FontName rather than having all fonts share the same name.
102 %% (c) Added one further refinement to the heuristic for detecting
103 %% paragraph breaks: do not ever start a new paragraph after a
104 %% line ending in a hyphen.
106 %% (d) Added a bunch of missing letters from the T1Encoding,
107 %% OT1Encoding and ISOLatin1Encoding vectors to .letter.chars to
108 %% improve hyphen-elimination algorithm. This still won't help
109 %% if there's no useful Encoding vector.
111 %% NOTE: A better solution to the problem of missing Encoding vectors
112 %% might be to redefine definefont to check whether the Encoding
113 %% vector is sensible and, if not, replace it by a default. This
114 %% would alleviate the need for constant tests in the .show.write
115 %% loop, as well as automatically solving the problem noted in fix
116 %% (d) above, and the similar problem with .break.chars. This should
117 %% be investigated. Also, the hyphen-elimination algorithm really
118 %% needs to be looked at carefully and rethought.
120 %%* Modifications to ps2ascii.ps by David M. Jones
121 %%* (dmjones@theory.lcs.mit.edu), June 25-July 8, 1996
125 %%* (a) added code to give better support for dvips files by providing
126 %%* FontBBox's, FontName's and Encoding vectors for downloaded
127 %%* bitmap fonts. This is done by using dvips's start-hook to
128 %%* overwrite the df-tail and D procedures that dvips uses to
129 %%* define its Type 3 bitmap fonts. Thus, this change should
130 %%* provide better support for dvips-generated PS files without
131 %%* affecting the handling of other documents.
133 %%* (b) Fixed two bugs that could potentially affect any PS file, not
134 %%* just those created by dvips: (1) added missing "get" operator
135 %%* in .show.write and (2) fixed bug that caused a hyphen at the
136 %%* end of a line to be replaced by a space rather than begin
137 %%* deleted. Note that the first bug was a source of stack
138 %%* leakage, causing ps2ascii to run out of operand stack space
141 %%* Search for "%%* BF" to find these modifications.
143 %%* (c) Improved the heuristic for determining whether a line break
144 %%* has occurred and whether a line break represents a paragraph
145 %%* break. Previously, any change in the vertical position caused
146 %%* a line break; now a line break is only registered if the
147 %%* change is larger than the height of the current font. This
148 %%* means that superscripts, subscripts, and such things as
149 %%* shifted accents generated by TeX won't cause line breaks.
150 %%* Paragraph-recognition is now done by comparing the indentation
151 %%* of the new line to the indentation of the previous line and by
152 %%* comparing the vertical distance between the new line and the
153 %%* previous line to the vertical distance between the previous
154 %%* line and its predecessor.
156 %%* (d) Added a hook for renaming the files where stdout and stderr
159 %%* In general, my additions or changes to the code are described in
160 %%* comments beginning with "%%*". However, there are numerous other
161 %%* places where I have either re-formatted code or added comments to
162 %%* the code while I was trying to understand it. These are usually
163 %%* not specially marked.
167 systemdict wcheck { systemdict } { userdict } ifelse begin
168 /.max where { pop } { /.max { 2 copy lt { exch } if pop } bind def } ifelse
169 /COMPLEX dup where { pop true } { false } ifelse def
170 /SIMPLE dup where { pop true } { false } ifelse def
172 { pop currentglobal /setglobal load true setglobal }
176 % Define a way to store and retrieve integers that survives save/restore.
178 /.i.string .i.string0 length string def
179 /.iget { cvi } bind def
180 /.iput { exch //.i.string exch copy cvs pop } bind def
181 /.inew { //.i.string0 dup length string copy } bind def
183 % We only want to redefine operators if they are defined already.
185 /codef { 1 index where { pop def } { pop pop } ifelse } def
187 % Redefine the end-of-page operators.
190 /copypage { SIMPLE { (\014) } { (P\n) } ifelse //print } codef
191 /showpage { copypage erasepage initgraphics } codef
193 % Redefine the fill operators to detect rectangles.
195 /.orderrect % <llx> <lly> <urx> <ury> .orderrect <llx> <lly> <w> <h>
196 { % Ensure llx <= urx, lly <= ury.
197 1 index 4 index lt { 4 2 roll } if
198 dup 3 index lt { 3 1 roll exch } if
199 exch 3 index sub exch 2 index sub
202 { % Do a first pass to see if the path is all rectangles in
203 % the output coordinate system. We don't worry about overlapping
204 % rectangles that might be partially not filled.
205 % Stack: mark llx0 lly0 urx0 ury0 ... true mark x0 y0 ...
207 % Add a final moveto so we pick up any trailing unclosed subpath.
208 0 0 itransform moveto
209 { .coord counttomark 2 gt
210 { counttomark 4 gt { .fillcheckrect } { 4 2 roll pop pop } ifelse }
214 { cleartomark not mark exit }
215 { counttomark -2 roll 2 copy counttomark 2 roll .fillcheckrect }
216 pathforall cleartomark
217 { .showcolor counttomark 4 idiv
218 { counttomark -4 roll .orderrect
219 (R ) //print .show==4
228 { % Check whether the current subpath is a rectangle.
229 % If it is, add it to the list of rectangles being accumulated;
230 % if not exit the .fillcomplex loop.
231 % The subpath has not been closed.
232 % Stack: as in .fillcomplex, + newx newy
233 counttomark 10 eq { 9 index 9 index 4 2 roll } if
234 counttomark 12 ne { cleartomark not mark exit } if
236 % Check for the two possible forms of rectangles:
237 % x0 y0 x0 y1 x1 y1 x1 y0 x0 y0
238 % x0 y0 x1 y0 x1 y1 x0 y1 x0 y0
239 9 index 2 index eq 9 index 2 index eq and
241 { % Check for first form.
242 7 index 6 index eq and 6 index 5 index eq and 3 index 2 index eq and
244 { % Check for second form.
245 9 index 8 index eq and
246 8 index 7 index eq and 5 index 4 index eq and 4 index 3 index eq and
248 ifelse not { cleartomark not mark exit } if
249 % We have a rectangle.
250 pop pop pop pop 4 2 roll pop pop 8 4 roll
252 /eofill { COMPLEX { .fillcomplex } if newpath } codef
253 /fill { COMPLEX { .fillcomplex } if newpath } codef
254 /rectfill { gsave newpath .rectappend fill grestore } codef
255 /ueofill { gsave newpath uappend eofill grestore } codef
256 /ufill { gsave newpath uappend fill grestore } codef
258 % Redefine the stroke operators to detect rectangles.
262 dup type dup /arraytype eq exch /packedarraytype eq or
263 { dup length 6 eq { exch .rectappend concat } { .rectappend } ifelse }
265 ifelse stroke grestore
267 /.strokeline % <fromx> <fromy> <tox> <toy> .strokeline <tox> <toy>
268 % Note: fromx and fromy are in output coordinates;
269 % tox and toy are in user coordinates.
270 { .coord 2 copy 6 2 roll .orderrect
271 % Add in the line width. Assume square or round caps.
272 currentlinewidth 2 div dup .dcoord add abs 1 .max 5 1 roll
273 4 index add 4 1 roll 4 index add 4 1 roll
274 4 index sub 4 1 roll 5 -1 roll sub 4 1 roll
275 (R ) //print .show==4
278 { % Do a first pass to see if the path is all horizontal and vertical
279 % lines in the output coordinate system.
280 % Stack: true mark origx origy curx cury
281 true mark null null null null
282 { .coord 6 2 roll pop pop pop pop 2 copy }
283 { .coord 1 index 4 index eq 1 index 4 index eq or
285 { cleartomark not mark exit }
288 { cleartomark not mark exit }
289 { counttomark -2 roll 2 copy counttomark 2 roll
290 1 index 4 index eq 1 index 4 index eq or
292 { cleartomark not mark exit }
295 pathforall cleartomark
296 0 currentlinewidth .dcoord 0 eq exch 0 eq or and
297 % Do the second pass to write out the rectangles.
298 % Stack: origx origy curx cury
299 { .showcolor null null null null
300 { 6 2 roll pop pop pop pop 2 copy .coord }
303 { 3 index 3 index .strokeline }
304 pathforall pop pop pop pop
308 /stroke { COMPLEX { .strokecomplex } if newpath } codef
311 dup length 6 eq { exch uappend concat } { uappend } ifelse
315 % The image operators must read the input and note the dimensions.
316 % Eventually we should redefine these to detect 1-bit-high all-black images,
317 % since this is how dvips does underlining (!).
319 /.noteimagerect % <width> <height> <matrix> .noteimagerect -
321 { gsave setmatrix itransform 0 0 itransform
322 grestore .coord 4 2 roll .coord .orderrect
323 (I ) //print .show==4
332 { dup 6 add index 1 index 6 add index 2 index 5 add index }
333 { 6 index 6 index 5 index }
334 ifelse .noteimagerect gsave nulldevice //colorimage grestore
337 /.noteimage % Arguments as for image[mask]
338 { dup type /dicttype eq
339 { dup /Width get 1 index /Height get 2 index /ImageMatrix get }
340 { 4 index 4 index 3 index }
341 ifelse .noteimagerect
343 /image { .noteimage gsave nulldevice //image grestore } codef
344 /imagemask { .noteimage gsave nulldevice //imagemask grestore } codef
346 % Output the current color if necessary.
348 .color.r -1 .iput % make sure we write the color at the beginning
355 3 1 roll 1000 mul round cvi
356 exch 1000 mul round cvi
358 dup //.color.r .iget eq
359 2 index //.color.g .iget eq and
360 3 index //.color.b .iget eq and
364 dup //.color.r exch .iput .show==only
365 ( ) //print dup //.color.g exch .iput .show==only
366 ( ) //print dup //.color.b exch .iput .show==only
376 % Set things up so our output will be in tenths of a point, with origin at
377 % lower left. This isolates us from the peculiarities of individual devices.
379 /.show.ident.matrix matrix def
380 /.show.ident { % - .show.ident <scale> <matrix>
381 % //.show.ident.matrix defaultmatrix
382 % % Assume the original transformation is well-behaved.
383 % 0.1 0 2 index dtransform abs exch abs .max /.show.scale exch def
384 % 0.1 dup 3 -1 roll scale
386 % Assume the original transformation is well-behaved...
387 0.1 0 dtransform abs exch abs .max
388 0.1 dup scale .show.ident.matrix currentmatrix
389 % ... but undo any rotation into landscape orientation.
391 1 get dup abs div 90 mul rotate
392 .show.ident.matrix currentmatrix
397 /.coord { % <x> <y> .coord <x'> <y'>
398 transform .show.ident exch pop itransform
399 exch round cvi exch round cvi
402 /.dcoord { % <dx> <dy> .coord <dx'> <dy'>
403 % Transforming distances is trickier, because
404 % the coordinate system might be rotated.
405 .show.ident pop 3 1 roll
407 dup mul exch dup mul add sqrt
408 2 index div round cvi
409 exch 0 exch dtransform
410 dup mul exch dup mul add sqrt
411 3 -1 roll div round cvi
414 % Remember the current X, Y, and height.
417 /.show.height .inew def
419 % Remember the last character of the previous string; if it was a
420 % hyphen preceded by a letter, we didn't output the hyphen.
422 /.show.last (\000) def
424 % Remember the current font.
425 /.font.name 130 string def
426 /.font.name.length .inew def
427 /.font.height .inew def
428 /.font.width .inew def
430 %%* Also remember indentation of current line and previous vertical
433 /.show.indent .inew def
436 % We have to redirect stdout somehow....
438 /.show.stdout { (%stdout) (w) file } bind def
440 % Make sure writing will work even if a program uses =string.
441 /.show.string =string length string def
442 /.show.=string =string length string def
444 { //=string //.show.=string copy pop
445 dup type /stringtype eq
446 { dup length //.show.string length le
447 { dup rcheck { //.show.string copy } if
450 .show.stdout exch write==only
451 //.show.=string //=string copy pop
454 { 4 -1 roll .show==only ( ) //print
455 3 -1 roll .show==only ( ) //print
456 exch .show==only ( ) //print
457 .show==only (\n) //print
460 /.showwidth % Same as stringwidth, but disable COMPLEX so that
461 % we don't try to detect rectangles during BuildChar.
463 { /COMPLEX false def stringwidth /COMPLEX true def }
468 /.showfont % <string> .showfont <string>
470 % Try getting the height and width of the font from the FontBBox.
471 currentfont /FontBBox .knownget not { {0 0 0 0} } if
472 aload pop % llx lly urx ury
473 exch 4 -1 roll % lly ury urx llx
475 3 1 roll exch % dx ury lly
478 { currentfont /FontMatrix get dtransform
481 % Fonts produced by dvips, among other applications, have
482 % BuildChar procedures that bomb out when given unexpected
483 % characters, and there is no way to determine whether a given
484 % character will do this. So for Type 1 fonts, we measure a
485 % typical character ('X'); for others, we punt.
486 currentfont /FontType get 1 eq
487 { (X) .showwidth pop dup 1.3 mul
489 { % No safe way to get the character size. Punt.
495 currentfont /FontName .knownget not { () } if
496 dup type /stringtype ne { //.show.string cvs } if
498 % Stack: height width fontname
500 { pop pop //.show.height exch .iput }
501 { 2 index //.font.height .iget eq
502 2 index //.font.width .iget eq and
503 1 index //.font.name 0 //.font.name.length .iget getinterval eq and
507 3 -1 roll dup //.font.height exch .iput .show==only ( ) //print
508 exch dup //.font.width exch .iput .show==only ( ) //print
509 dup length //.font.name.length exch .iput
510 //.font.name cvs .show==only (\n) //print
517 % Define the letters -- characters which, if they occur followed by a hyphen
518 % at the end of a line, cause the hyphen and line break to be ignored.
519 /.letter.chars 100 dict def
521 65 1 90 { dup 32 add } for
523 { StandardEncoding exch get .letter.chars exch dup put }
527 %%* Add the rest of the letters from the [O]T1Encoding and
528 %%* ISOLatin1Encoding vectors
667 { .letter.chars exch dup put }
671 % Define a set of characters which, if they occur at the start of a line,
672 % are taken as indicating a paragraph break.
673 /.break.chars 50 dict def
675 /bullet /dagger /daggerdbl /periodcentered /section
677 { .break.chars exch dup put }
681 % Define character translation to ASCII.
682 % We have to do this for the entire character set.
684 /.char.map 500 dict def
686 /.chars.def { counttomark 2 idiv { .char.map 3 1 roll put } repeat pop } def
688 % Encode the printable ASCII characters.
691 { 1 string dup 0 4 -1 roll put
692 dup 0 get StandardEncoding exch get exch
708 % Encode the ISO accented characters.
710 { ISOLatin1Encoding exch get =string cvs
711 dup 0 1 getinterval 1 index dup length 1 sub 1 exch getinterval
712 .char.map 2 index known .char.map 2 index known and
713 { .char.map 3 -1 roll get .char.map 3 -1 roll get concatstrings
714 .char.map 3 1 roll put
722 % Encode the remaining standard and ISO alphabetic characters.
725 /AE (AE) /Eth (DH) /OE (OE) /Thorn (Th)
727 /ffi (ffi) /ffl (ffl) /fi (fi) /fl (fl)
728 /germandbls (ss) /oe (oe) /thorn (th)
731 % Encode the other standard and ISO characters.
734 /brokenbar (|) /bullet (*) /copyright ((C)) /currency (#)
735 /dagger (#) /daggerdbl (##) /degree (o) /divide (/) /dotaccent (.)
737 /ellipsis (...) /emdash (--) /endash (-) /exclamdown (!)
738 /florin (f) /fraction (/)
739 /guillemotleft (<<) /guillemotright (>>)
740 /guilsinglleft (<) /guilsinglright (>) /hungarumlaut ("") /logicalnot (~)
741 /macron (_) /minus (-) /mu (u) /multiply (*)
742 /ogonek (,) /onehalf (1/2) /onequarter (1/4) /onesuperior (1)
743 /ordfeminine (-a) /ordmasculine (-o)
744 /paragraph (||) /periodcentered (*) /perthousand (o/oo) /plusminus (+-)
745 /questiondown (?) /quotedblbase (") /quotedblleft (") /quotedblright (")
746 /quotesinglbase (,) /quotesingle (') /registered ((R))
747 /section ($) /sterling (#)
748 /threequarters (3/4) /threesuperior (3) /trademark ((TM)) /twosuperior (2)
752 % Encode a few common Symbol characters.
755 /asteriskmath (*) /copyrightsans ((C)) /copyrightserif ((C))
756 /greaterequal (>=) /lessequal (<=) /registersans ((R)) /registerserif ((R))
757 /trademarksans ((TM)) /trademarkserif ((TM))
760 %%* Add a few characters from StandardEncoding and ISOLatin1Encoding
761 %%* that were missing.
772 %%* Define the OT1Encoding and T1Encoding vectors for use with dvips
773 %%* files. Unfortunately, there's no way of telling what font is
774 %%* really being used within a dvips document, so we can't provide an
775 %%* appropriate encoding for each individual font. Instead, we'll
776 %%* just provide support for the two most popular text encodings, the
777 %%* OT1 and T1 encodings, and just accept the fact that any font not
778 %%* using one of those encodings will be rendered as gibberish.
780 %%* OT1 is Knuth's 7-bit encoding for the CMR text fonts, while T1
781 %%* (aka the Cork encoding) is the 8-bit encoding used by the DC
782 %%* fonts, a preliminary version of the proposed Extended Computer
783 %%* Modern fonts. Unfortunately, T1 is not a strict extension of OT1;
784 %%* they differ in positions 8#000 through 8#040, 8#074, 8#076, 8#134,
785 %%* 8#137, 8#173, 8#174, 8#175 and 8#177, so we can't use the same
788 %%* Of course, we also can't reliably tell the difference between an
789 %%* OT1-encoded font and a T1-encoded font based on the information in
790 %%* a dvips-created PostScript file. As a best-guess solution, we'll
791 %%* use the T1 encoding if the font contains any characters in
792 %%* positions above 8#177 and the OT1 encoding if it doesn't.
794 /T1Encoding 256 array def
796 /OT1Encoding 256 array def
798 %%* T1Encoding shares a lot with StandardEncoding, so let's start
801 StandardEncoding T1Encoding copy pop
806 { OT1Encoding 3 1 roll put }
814 { T1Encoding 3 1 roll put }
834 8#015 /quotesinglbase
836 8#017 /guilsinglright
842 8#024 /guillemotright
847 8#030 /perthousandzero
856 %% 8#040 through 8#176 follow StandardEncoding
995 %%* Now copy OT1Encoding into T1Encoding and make a few changes.
997 T1Encoding OT1Encoding copy pop
1038 8#042 /quotedblright
1052 %%* And add a few characters from the OT1Encoding
1062 /Upsilon (\\Upsilon )
1073 /perthousandzero (0)
1090 /Ohungarumlaut (O"")
1098 /Uhungarumlaut (U"")
1120 /ohungarumlaut (o"")
1128 /uhungarumlaut (u"")
1137 %%* We extend the df-tail command to stick in an Encoding vector (see
1138 %%* above for a discussion of the T1 and OT1 encodings), put in a
1139 %%* FontName (which will just be dvips's name for the font, i.e., Fa,
1140 %%* Fb, etc.) and give each font a separate FontBBox instead of
1141 %%* letting them all share a single one.
1143 /dvips.df-tail % id numcc maxcc df-tail
1148 %% Choose an encoding based on the highest position occupied.
1150 dup 128 gt { T1Encoding } { OT1Encoding } ifelse
1154 %% It's ok for all the fonts to share a FontMatrix, but they
1155 %% need to have separate FontBBoxes
1158 /FontBBox [0 0 0 0] N
1162 %% And let's throw in a FontName for good measure
1166 %% Make sure each font gets it own private FontName. -- dmj,
1169 dup length string copy
1171 /BuildChar {CharBuilder} N
1173 dup { /foo setfont }
1181 %%* This is functionally equivalent to dvips's /D procedure, but it
1182 %%* also calculates the Font Bounding Box while defining the
1185 /dvips.D % char-data ch D - : define character bitmap in current font
1188 dup type /stringtype ne {]} if % char-data
1193 /ch-width { Cw } def
1194 /ch-height { Ch } def
1200 nn /base get cc ctr put % (adds ctr to cc'th position of BASE)
1203 ch-data % BitMaps ctr char-data
1205 dup dup length 1 sub dup 2 index S get sf div put
1207 put % puts char-data into BitMaps at index ctr
1210 %% Make sure the Font Bounding Box encloses the Bounding Box of the
1211 %% current character
1213 nn /FontBBox get % BB
1215 dup % calculate new llx
1221 dup % calculate new lly
1223 ch-yoff ch-height sub
1227 dup % calculate new urx
1233 dup 3 get % calculate new ury
1240 %%* Define start-hook to replace df-tail and D by our versions.
1241 %%* Unfortunately, the user can redefine start-hook and thus bypass
1242 %%* these changes, but I don't see an obvious way around that.
1244 userdict /start-hook {
1245 TeXDict /df-tail /dvips.df-tail load bind put
1246 TeXDict /D /dvips.D load bind put
1249 %%* Introduce a symbolic constant for hyphens. (Need to make
1250 %%* allowance for hyphen being in different place?)
1254 % Write out a string. If it ends in a letter and a hyphen,
1255 % don't write the hyphen, and set .show.last to a hyphen;
1256 % otherwise, set .show.last to the character (or \000 if it was a hyphen).
1257 /.show.write % <string>
1260 { dup dup length 1 sub get % string last_char
1261 dup .hyphen eq % string last_char hyphen?
1262 { % string last_char
1264 { 1 index dup length 2 sub get }
1265 { //.show.last 0 get }
1266 ifelse % string last_char prev-char
1267 currentfont /Encoding get exch get % look up prev-char
1268 //.letter.chars exch known % is it a letter?
1269 { % Remove the hyphen % string last_char
1270 exch % last_char string
1271 dup length 1 sub % last_char string len-1
1272 0 exch getinterval % last_char string-1
1273 exch % string-1 last_char
1275 { pop 0 } % string 0
1279 //.show.last 0 3 -1 roll put % store last_char
1288 currentfont /FontType get 0 ne
1290 { % begin forall % c
1292 currentfont /Encoding get % c c vec
1294 dup //.char.map exch known % c name bool
1296 { pop OT1Encoding exch get }
1298 //.char.map exch get % translation
1299 .show.stdout exch writestring
1303 { (\0) dup 0 get 0 eq
1305 (%stderr) (w) file dup
1306 (*** Warning: composite font characters dumped without decoding.\n) writestring
1312 .show.stdout exch writestring
1317 /.showstring1 { % string
1318 currentpoint .coord % string x y
1319 3 -1 roll dup .showwidth % x y string dx dy
1320 1 index % x y string dx dy dx
1321 0 rmoveto % x y string dx dy
1322 .dcoord pop % x y string width
1324 { % x y string width
1325 2 index % x y string width y
1326 //.show.y .iget % x y string width y old.y
1328 %%* Replaced test "has y changed" by "has y changed by more
1329 %%* than the current font height" so that subscripts and
1330 %%* superscripts won't cause line/paragraph breaks
1332 sub abs dup % x y string width dy dy
1333 //.show.height .iget
1335 { % x y string width dy
1337 %%* Vertical position has changed by more than the font
1338 %%* height, so we now try to figure out whether we've
1339 %%* started a new paragraph or merely a new line, using a
1340 %%* variety of heuristics.
1342 %%* If any of the following is true, we start a new
1345 %%* (a) the current vertical shift is more than 1.1 times
1346 %%* the previous vertical shift, where 1.1 is an
1347 %%* arbitrarily chosen factor that could probably be
1350 dup % x y string width dy dy
1351 //.show.dy .iget 1.1 mul
1355 %%* Save the new vertical shift
1357 //.show.dy exch .iput
1359 %%* (b) The vertical shift is more than 1.3 times the
1360 %%* "size" of the current font. I've removed this
1361 %%* test since it's not really very useful.
1363 %%* //.show.dy .iget
1364 %%* //.show.height .iget 1.4 mul
1365 %%* gt % x y string width bool
1366 %%* .show.height .iget 0 gt and % only perform test if font
1367 %%* % height is nonzero
1370 %%* (c) the first character of the new line is one of the
1373 2 index length % x y string width newpar? len
1374 0 gt % x y string width newpar? len>0?
1376 2 index 0 get % x y string width newpar? s
1377 currentfont /Encoding get
1378 exch get % x y string width newpar? s_enc
1379 //.break.chars exch known { pop true } if
1381 if % x y string width newpar?
1383 %%* (d) The indentation of the new line is greater than
1384 %%* the indentation of the previous line.
1387 //.show.indent .iget
1391 %%* HOWEVER, if the line ends in a hyphen, we do NOT begin
1392 %%* a new paragraph (cf. comment at end of BF2). --dmj,
1395 //.show.last 0 get .hyphen ne
1399 { (\n\n) } % Paragraph
1402 %%* BF2: If last character on a line is
1403 %%* a hyphen, we omit the hyphen and
1404 %%* run the lines together. Of
1405 %%* course, this will fail if a word
1406 %%* with an explicit hyphen (e.g.,
1407 %%* X-ray) is split across two lines.
1408 %%* Oh, well. (What should we do
1409 %%* about a hyphen that ends a
1410 %%* "paragraph"? Perhaps that should
1411 %%* inhibit a paragraph break.)
1413 //.show.last 0 get .hyphen eq
1416 ifelse % x y string width char
1421 //.show.y 3 index .iput % x y string width
1422 //.show.x 4 index .iput % x y string width
1423 //.show.indent 4 index .iput
1425 { % x y string width dy
1426 % If the word processor split a hyphenated word within
1427 % the same line, put out the hyphen now.
1429 //.show.last 0 get .hyphen eq { (-) //print } if
1433 %%* If have moved more than 1 point to
1434 %%* the right, interpret it as a
1435 %%* space? This need to be looked at
1438 3 index % x y string width x
1439 //.show.x .iget 10 add gt % x y string width bool
1443 4 1 roll % width x y string
1444 .show.write pop % width x
1445 add //.show.x exch .iput % <empty>
1447 { (S ) //print .show==4 }
1452 { dup () eq { pop } { .showstring1 } ifelse
1455 % Redefine all the string display operators.
1463 % We define all the other operators in terms of .show1.
1465 /.show1.string ( ) def
1466 /.show1 { //.show1.string exch 0 exch put //.show1.string .showstring } odef
1468 { .showfont .showcolor
1469 { .show1 2 copy rmoveto } forall
1473 { .showfont .showcolor
1474 { dup .show1 4 index eq { 4 index 4 index rmoveto } if
1481 { .showfont .showcolor
1482 //.show1.string 0 4 -1 roll put
1483 { //.show1.string search not { exit } if
1484 .showstring .showstring
1485 2 index 2 index rmoveto
1490 { .showfont .showcolor
1491 %**************** Should construct a closure, in case the procedure
1492 %**************** affects the o-stack.
1493 { .show1 dup exec } forall pop
1496 % We don't really do the right thing with the Level 2 show operators,
1497 % but we do something semi-reasonable.
1498 /xshow { pop show } codef
1499 /yshow { pop show } codef
1500 /xyshow { pop show } codef
1502 { currentfont /Encoding .knownget not { {} } if
1503 0 1 2 index length 1 sub
1504 { % Stack: glyph encoding index
1505 2 copy get 3 index eq { exch pop exch pop null exit } if
1508 for null eq { (X) dup 0 4 -1 roll put show } { pop } ifelse
1513 % Bind the operators we just defined, and all the others if we didn't
1516 DELAYBIND { .bindnow } if
1518 % Make systemdict read-only if it wasn't already.
1520 systemdict wcheck { systemdict readonly pop } if
1522 % Restore the current local/global VM mode.