You are on page 1of 3

A comparison of the relative byte lengths of the images from NBC's two Xerox 753 5 PDFs raises questions

regarding identical characters that the three Amigos hav e attributed to JBIG2 compression of the non-background layers. The Obot claim is that the JBIG2 compression filter creates multiple identical c haracters and places them within the image and that these identical characters s urvive the decompression and subsequent re-compression applied by the Preview an d the Quartz compression filters without change. However, the Obots have provided no evidence that proves their claim. The underlying question is: How does MAC OS X Preview and Mac OS X 10.6.7 Quartz PDFContext handle JBIG2 com pressed images ? We don't really know the version number of MAC OS X that was loaded onto the Mac intosh computer which (purportedly) opened the WH LFCOLB PDF images in Preview. We can probably assume that the MAC OS X version was 10.6 or an earlier version . MAC OS X version 10.6 is "Snow Leopard". MAC OS X 10.7 was the first complet ely 64 Bit version of MAC OS X. However, JBIG2 requires PDF 1.4 which apparentl y is still not available even as late as MAC OS X 10.8. So Apple continues to do strange stuff to PDF files. So what's Preview doing to these Xerox scan to PDF, JBIG2 compressed image files ? Only the three Amigos know. We also don't know the version of MAC OS X that was used to process NBC's PDF fi les. Nevertheless, with a few google searches, we discovered that the question of Pre view's handling of JBIG2 compressed images is still being debated as late as MAC OS X version 10.8. MAC OS X version 10.8 is "Mountain Lion". A sample of the ongoing debate is found here: See: "Does OS X Preview support saving PDF with JBIG2 compression ?" http://superuser.com/questions/583630/does-os-x-preview-support-saving-pdf-withjbig2-compression After the back and forth on this post we are left with two possible answers. Pr eview either re-compresses the JBIG2 image with DCTDecode or with FlateDecode. Acrobat Preflight identifies the filter as FlateDecode. The DCTDecode filter is lossy and the FlateDecode filter is usually lossless. Regardless of the answer to the question of which filter, the data provided by the person who posted the question shows that the file size of the re-compressed Preview print to PDF file was consistently greater than the original JBIG2 compressed file. The file sizes of NBC's two Xerox 7535 PDF files are: wh-lfbc-scanned-xerox-7535-wc.pdf 253 KB 296 KB

wh-lfbc-scanned-xerox-7535-wcpreview.pdf

Thus the file size of NBC's Preview print to PDF file is 43 KB greater than the file size of his Xerox scan to PDF file. The file size of each compressed image for each layer can be estimated by the by te length of each image extracted from the two PDF files.

The sum of the image byte lengths for the 17 layer images for each file are : wh-lfbc-scanned-xerox-7535-wc.pdf 249 KB 291 KB

wh-lfbc-scanned-xerox-7535-wcpreview.pdf

Hence the file size of the Xerox scan to PDF file is 4 KB greater than the sum o f the image bytelengths for the 17 PDF image layers. Likewise the file size of the Preview print to PDF file is 5 KB greater than the sum of the image byte lengths of the same 17 layers. Surprisingly the image byte lengths of the 17 Preview print to PDF images are no t consistently greater than the byte lengths of the same layers in the Xerox 753 5 E-mail to PDF file. This suggests that the relative file size depends on the image content. The 17 layers of the Xerox/Preview produced PDF files are a mixt ure of text and non-text images. The byte lengths for each image from each file are compared in the following tab le: 1 231258 235646 2 12345 45815 3 1663 3930 4 393 490 5 506 1288 6 313 502 7 306 373 8 288 448 9 286 421 10 226 192 11 243 385 12 254 325 13 244 184 14 248 184 15 181 154 16 170 139 17 174 116 -----------249098 290592 As previously stated, the total sums of the byte lengths are slightly less than their respective PDF file sizes. Only 6 of the 16 non-backgroung layers are text layers. The other 10 non-backgr ound layers contain no text. These 10 objects are the ones that the three Amigos don't want you to know about . These are the new object types that are found in the Xerox produced files that are not found in the WH LFCOLB PDF image file. A significant finding is that the 45815 byte length of image 2 from the Preview print to PDF file is almost four times greater than the 12345 byte length of the same image from the Xerox scan to PDF file. Layer 2 is the largest text layer. Likewise, the 3930 byte length of image 3 from the Preview print to PDF file is twice the 1663 byte length of image 2 from the Xerox scan to PDF file. Image 3 is the top text in Onaka's signature stamp impression. This is the second larges t text layer. Layers 13 through 17 are all non-text layers. These layers are all image masks

containing bits of solid colors. These are either monochrome White or Green in color. The byte lengths of these images for the Preview print to PDF file are e ach smaller than the byte lengths of the corresponding Xerox scan to PDF images.

You might also like