Bug: Images aren't always embedded into PDFs
Description
Image objects (images and image fields) have a property which you can set to specify whether the image should be embedded or linked (the "Embed Image Data" check box on the Object palette’s Field tab).
When the property is checked, the result is that the image file loaded into the image object is embedded into the XFA Data that’s stored in the form. When the property is unchecked, only the URI (file path or URL) is stored in the XFA Data.
I put emphasis on XFA Data above because it’s important to understand the difference between the XFA layer and the PDF layer of a PDF form in order to understand what this bug is all about: Essentially, an XFA form saved as a PDF file results in a PDF container with the XFA inside (as opposed to saving the form as an XDP where the XFA form is at the top layer). When the PDF form is subsequently filled and saved within Acrobat (Standard or Pro), the image loaded into an image field object is supposed to be saved on one layer or the other depending on the setting of the "Embed Image Data" property on the image field.
When you save a PDF form in Acrobat, what gets saved in the XFA Data (in the XFA layer) depends on the value of the "Embed Image Data" property (which maps to the //field/ui/imageEdit@data attribute). If it’s set to "link" (unchecked), then only the original path to the image file is saved. If it’s set to "embed", then the image file is text-encoded and saved in the data instead of the original file path. In either case, however, the image should always be embedded into the PDF layer (don’t worry, it doesn’t actually get embedded twice, once on each layer, I’m just simplifying things a little here) such that the PDF is self-contained and can be re-distributed without having to ensure that the linked image file remains accessible from any location.
Unfortunately, there are cases when the image file doesn’t get embedded into the PDF layer when it should be. For example, if you have an image field object which is set to link to its image file and you load the image by importing data that contains the file path then save the PDF, the image won’t be embedded into the PDF as expected.
Workaround
At this time, I’m not aware of any workarounds to the specific issue I stated above (when importing the image’s file path).
The only way to consistently get image files to be embedded into the PDF as they should be — regardless of the image field object’s setting to link or embed its data — is to manually click on the image field object in Acrobat and pick the image file.
Fix
Please refer to the Bug List for updated information on the version(s) affected by this bug as well as if and when it was/will be fixed.
Update (May 22, 2008)
This bug was addressed in Acrobat/Reader 8.1 (XFA 2.6, PDF 1.7) in a way that addresses both security and PDF self-containment issues.
For XFA 2.6+ forms (forms authored in Designer 8.1+), images that are linked (referenced) rather than being embedded are now stored (the bits are embedded) in an array in the PDF layer when the PDF form is saved. At runtime, if an image is linked (referenced) rather than embedded, Acrobat/Reader will look for the associated image bits in the PDF layer and, if a match is found, will load the image. This means that Acrobat/Reader no longer goes outside of the PDF to fetch images unless explicitly directed by the user (e.g. the user clicks on an image field and chooses an image on their own) which addresses security issues. The fact that linked images are now stored in the PDF layer also addresses self-containment issues where the nature of a PDF document is that it is a self-contained entity (e.g. when you email a PDF, the recipient can always open it and see its content even if they don’t have access to referenced content — the exception being if the form makes an external data connection to a database or web service or needs an XML data file to be imported).
Image fields with content embedded at design time are still stored in the XFA data as they were before. You may still load an image dynamically from data provided that the data is the actual image bits or, if the data is a link, that there are associated image bits already in the PDF layer (which implies that the image was linked from the form at design-time — for example, you could have 5 hidden image objects each linking to different images and then use a link in the data to control which of those images show up in an image field on your form). In either case (whether the image is loaded via embedded bits or a link), the image bits always end-up in the data submitted from the form. The original link is not included in the submitted data nor is it available at runtime via scripting (unless imported in data via a data connection or XML data file) because it could contain sensitive information (e.g. your username).
Posted by Stefan Cameron on January 23rd, 2007
Filed under Acrobat,Bugs
