JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format
Johan van der Knijff
Despite the increasing popularity of JPEG 2000 in the archival community, the suitability of the JP2 format for long-term preservation has been poorly addressed by existing literature. This paper demonstrates how some parts of the JP2 file specification (related to ICC profiles and grid resolution) contain ambiguous information, leading to a situation where different software vendors are interpreting the standard in slightly different ways. This results in a number of risks for preservation. These risks could be reduced by applying some minor changes to the format specification, in combination with the adherence to the updated standard by software vendors.
The last few years have seen a marked rise in the use of JPEG 2000 in the cultural heritage sector. Several institutions are now using JPEG 2000 Part 1 (the JP2 format) as a preferred archival and access format for digital imagery. Examples include (but are not limited to) the National Library of the Netherlands (Gillesse et al., 2008), the British Library (McLeod & Wheatley, 2007), the Wellcome Library (Henshaw, 2010a), Library of Congress (Buckley & Sam, 2006), the National Library of Norway (National Library of Norway, 2007), and the National Library of the Czech Republic (Vychodil, 2010). A number of other institutions are currently investigating the feasibility of using JP2 as a replacement of uncompressed TIFF, which is still the most widely used still image format for long-term archiving and preservation. In spite of the wide interest in JPEG 2000 from the archival community, the existing literature is surprisingly sparse on the actual suitability of the standard for long-term preservation. If preservation is addressed at all, what's often lacking is a specification of what information inside an image is worth preserving in the first place. Moreover, such discussions are often limited to largely theoretical considerations (e.g. features of the JP2 format), without going into the more practical aspects (e.g. to what extent do existing software tools actually follow the features that are defined by the format specification).
However, without taking such factors into account, can we say anything meaningful about how an image that is created using today's software will be rendered in, say, 30 years time? Also, at some point in the future it may be necessary to migrate today's images to a new format. How confident can we be about not losing any important information in this process? Alternatively, if we opt for emulation as a preservation strategy, how will the images behave in an emulated environment?
The above questions are central to this paper. There are many aspects to assessing the suitability of a file format for a particular preservation aim (see e.g. LoC, 2007 and Brown, 2008). In this paper I limit myself to addressing two areas where the JP2 format specification can be interpreted in more than one way: support of ICC profiles and the definition of grid resolution. I demonstrate how these ambiguities have lead to divergent interpretations of the format by different software vendors, and how this introduces risks for long-term preservation. I also present some possible solutions. Finally, I provide a number of practical recommendations that may help institutions to mitigate the risks for their existing collections.
Unless stated otherwise, the observations in this paper only apply to the JP2 file format, which is defined by JPEG 2000 Part 1 (ISO/IEC, 2004a).
Colour management in JP2: restricted ICC profiles
Section I.3 of the JP2 format specification (ISO/IEC, 2004a) describes the methods that can be used to define the colour space of an image. The most flexible method uses ICC profiles, and is based on version ICC.1:1989-09 of the ICC specification (ICC, 1998). JP2 supports the use of ICC profiles for monochrome and three-component colour spaces (such as greyscale and RGB). However, JP2 does not support all features of the ICC standard. Instead, it uses the concept of a "Restricted ICC profile", which is defined as follows:
"This profile shall specify the transformation needed to convert the decompressed image data into the PCSXYZ, and shall conform to either the Monochrome Input or Three-Component Matrix-Based Input profile class, and contain all the required tags specified therein, as defined in ICC.1:1998-09." (ISO/IEC 2004a, Table I.9).
To appreciate what this actually means, it is helpful to give some additional information on the ICC standard. First of all, the ICC specification distinguishes 7 separate ICC profile classes. The most commonly used ones are the "Input Device" (or simply "Input"), "Display Device" ("Display") and "Output Device" ("Output") classes. Another one that is relevant in this context is the "ColorSpace Conversion" class. Second, it is important to know how colour transformations can be defined within the ICC standard. For monochrome images, the colour transformation is always described using a gray tone reproduction curve (TRC), which is simply a one-dimensional table. For RGB spaces, two methods are available. The first one is based on a three-component matrix multiplication. The second (N-component LUT-based method) uses an algorithm that includes a set of tone reproduction curves, a multidimensional lookup table and a set of linearisation curves (ICC, 1998).
Going back to the JP2 specification, the restrictions in the "Restricted ICC profile" class are:
The first restriction makes sense, since N-component LUT-based profiles are more complex than three-component matrix-based ones, and thus more difficult to implement. The logic behind the restriction of allowing only input profiles is more difficult to understand, since it prohibits the use of all other ICC profile classes. According to the ICC specification, the "input" class represents input devices such as cameras and scanners. However, widely used working colour spaces such as Adobe RGB 1998 (Adobe, 2005) and eciRGB v2 (ECI, 2007) are defined using profiles that belong to the display profile class. As a result, they are not allowed in JP2, even though both the Adobe RGB 1998 and eciRGB v2 profiles use the three-component matrix-based transformation method. Since there is no obvious reason for prohibiting such profiles, it would appear that the restriction to "input" profiles may be nothing more than an unintended error in the file specification. This impression is reinforced by the fact that the file specification of the JPX format (which is defined by JPEG 2000 Part 2) also consistently uses the phrase "input ICC profiles" in the definition of its "Any ICC profile" method (which doesn't have any restrictions on the use of N-component LUT-based profiles) (ISO/IEC, 2004b).
A major consequence of the "input" restriction is that a literal interpretation of the format specification limits the use of ICC profiles to such a degree that any serious colour management becomes impossible in JP2. For colour imagery, the only colour space that can be handled without using ICC profiles is sRGB. Full-colour printed materials often contain colours that cannot be represented in the sRGB colour space. If such materials need to be digitised with minimal loss of colour fidelity, a colour space with a wider gamut (such as Adobe RGB or eciRGB) is needed, and this requires the use of ICC profiles. Since the format specification prohibits this, this means that in its current form the JP2 format is unsuitable for applications that require colour support beyond sRGB.
Handling of ICC profiles by different encoders
In order to test how the most widely-used JPEG 2000 encoders handle ICC profiles in practice, I took a number of TIFF images that contain embedded ICC profiles, and tried to convert them to JP2 with the most widely used JPEG 2000 encoders. The ICC profiles in all experiments were display device profiles for Adobe RGB 1998 and eciRGB v2 working colour spaces, which both use the three-component matrix-based transformation method. I subsequently analysed all generated images using ExifTool 8.12 (Harvey) and JHOVE 1.4 (JHOVE). Table 1 summarises the results.
Table 1: Preservation of ICC profiles in TIFF to JPEG 2000 migration using different encoders.
We can make a couple of interesting observations from these results. First, 3 out of the 7 experiments resulted in a JPX file. JPX is an extension of the JP2 format that is defined by JPEG 2000 Part 2 (ISO/IEC, 2004b). Most JPX files can be read by JP2 decoders, which will simply ignore any features that are not permitted within JP2. It also contains a separate "Any ICC" method that unlike JP2 supports the use of N-component LUT-based ICC profiles. Decoders that do not include JPX support will simply ignore ICC profiles that are defined using this method. At present very few decoders include support for JPX, and the adoption of the format is negligible. Because of this, the format is not well suited for preservation.
With this in mind, the behaviour of version 184.108.40.206 of the Luratech software (which was reported also by Henshaw, 2010b) is somewhat odd. Depending on the characteristics of the input image, the encoder may decide to use the JPX format without any explicit instruction from the user to do so. Even worse, users may be completely unaware of this. Since the ICC profiles in all test images use the three-component matrix-based transformation, the only reason for not allowing them in JP2 would be the fact that they are not "input" profiles. However, since the "Any ICC" method in the format specification of JPX contains the very same "input" restriction, switching to JPX doesn't solve this problem. This behaviour has been corrected in more recent versions of Luratech's software. If version 220.127.116.11 of the encoder encounters a "display" profile in the input image, it writes a JP2 file, but it changes the "display" profile class value of the original profile to "input" in the resulting image .
Adobe's JPEG 2000 plugin for Photoshop only encodes to JPX format . However, it has an option to create JPX files that are "JP2 compatible". When this option is activated, in addition to the original profile, it adds a modified version to the image, where the "display" class is simply changed to "input". So, these images contain two different versions of the same profile.
ICC profiles are lost altogether in the Kakadu and ImageMagick migrations. This is consistent with earlier results by Kulovits et al. (2009). I should add here that Kakadu does actually support the use of ICC profiles, but in an indirect way that requires the user to specify a profile's parameters on the command line.
Only the Aware encoder managed to create JP2 images that include embedded "display" ICC profiles without altering them in any way during the migration. So, only Aware and recent versions of the Luratech encoder currently permit basic colour management in the JP2 format. Aware achieves this by deviating from the JP2 format specification, whereas Luratech simply changes the profile class fields.
Most still image formats use straightforward, fixed header fields for describing the grid resolution of the image data. For JP2 (and the other JPEG 2000 formats) the situation is somewhat more complex, because it distinguishes two separate resolution types. They are both optional, and an image may contain any, both or neither.
First, there is a "capture resolution", which is defined as "the grid resolution at which the source was digitized to create the image samples specified by the codestream". Two examples are given: the resolution of the flatbed scanner that captured a page from a book, or the resolution of an aerial digital camera or satellite camera (ISO/IEC 2004a, Section I.18.104.22.168).
Second, there is a "default display resolution", which is defined as "a desired display grid resolution". The specification states that "this may be used to determine the size of the image on a page when the image is placed in a page-layout program". It then continues by warning that "this value is only a default", and that "each application must determine an appropriate display size for that application" (ISO/IEC 2004a, Section I.22.214.171.124).
The definition of these resolution types is problematic for a number of reasons. First of all, the use of the word "digitized" in the definition of "capture resolution" implies that it only covers analog-to-digital capture processes, such as the scanning of a printed photograph. However, in the case of born-digital materials there is no such analog-to-digital capture process, so the definition does not apply. A similar situation arises if we scan a photograph at, say, 300 ppi, and subsequently resample the resulting image to 150 ppi. Obviously the original image has a capture resolution of 300 ppi, but it is less clear where we should store the grid resolution of the resampled image. One possibility would be to use the "default display" fields. However, the definition of "default display resolution" is rather vague, and it is difficult to understand what it means at all (e.g. what is "desired", and if this value is "only a default", what is this "default" based on?). My interpretation is that it is basically intended to allow reader applications to establish some sensible (but arbitrary) default zoom level upon opening the image. If this is correct, its value may be quite different from the grid resolution of the (either resampled or born-digital) image.
Semantic issues aside, the use of two separate sets of resolution fields also creates practical problems. First of all, it complicates the process of establishing the grid resolution of an image, since the location of this information ("capture" or "default display" fields) would become dependent on its creation history. Second, in the case of format migrations that may be part of imaging workflows as well as (future) preservation actions, there is no obvious mapping between the resolution fields of JP2 and other formats. Figure 1 illustrates this. Just as an example, most digitisation workflows still use TIFF for capture and intermediate processing, and the conversion to JP2 is only done as a final step. Since a TIFF image only has one set of resolution fields, to which JP2 fields should we map these values (taking into account that the TIFF may or may not have been resampled after capture)? Finally, there is the observation that, to the best of my knowledge, there is not a single example of a JPEG 2000 encoder that uses JP2's resolution fields in a manner that is consistent with the format specification. I will illustrate this in the next section.
Figure 1: Mapping of resolution fields in migrations to and from JPEG 2000. Migration 1 is a typical TIFF to JP2 migration in a digitisation workflow; migration 2 represents a preservation action that involves a migration from JP2 to some future image format. In both cases, the mapping of the resolution fields before and after the migration is not clearly defined.
Handling of resolution headers by different encoders
In order to find out how current encoders are handling the resolution fields in practice, I analysed how grid resolution is stored in the output images of the aforementioned TIFF to JPEG 2000 migration experiment. Table 2 shows the results.
Table 2: Header fields used for storing grid resolution after TIFF to JPEG 2000 migration using different encoders.
Luratech, Adobe and Aware always map the TIFF resolution fields to "capture resolution" in JPEG 2000. The ImageMagick files do not contain any resolution information at all. Only Kakadu always uses the "default display" fields. On a side note, Accusoft ImageGear, which uses the Kakadu libraries for writing JP2, also uses the "display" fields. This may apply to other Kakadu-based products as well. Crucially, none of these encoders use "capture resolution" in the way it is described in the format specification.
What these results show is that establishing the grid resolution of a JP2 image is not straightforward, because the location of this information is not well defined. It also shows that most encoders ignore the literal meaning of "capture resolution" in the JP2 format specification, and simply use these fields in a manner that is analogous to the TIFF resolution fields.
Implications for preservation
In the previous sections I explained how the JP2 file specification appears to be unnecessarily restrictive with respect to embedded ICC profiles, and I demonstrated that different software vendors are handling these restrictions in a variety of ways. From a preservation point of view, the central issue here (as already stated in the introduction to this paper) is what may be the impact of this on rendering existing images in the future, and the preservation of information in any future migration to some new format. There are several problems here. First of all, a strict adherence to the format specification would simply rule out the use of ICC profiles in most cases. This would make the format unsuitable for any applications that require a colour gamut beyond sRGB space. The Aware encoder permits the use of JP2 for such applications by ignoring the "input" profile restriction. However, by doing so, such files no longer adhere to the format specification. Recent versions of Luratech's encoder do stick to the format specification, but enable the use of "display" ICC profiles by changing the profile class fields. The impact on future migrations, or the use of such files in an emulated environment will most likely be minor in both cases. An "input" profile defines a transformation from a device-dependent colour space to a universal profile connection space (PCS), whereas a "display" profile simply describes the reverse pathway (from the PCS to a device-dependent space). Technically, both are identical, and the colour transformation will be performed correctly even if the profile class label doesn't match the actual use. However, as for Aware's solution, one cannot completely rule out that future decoders may ignore embedded "display" profiles, which is a potential risk for future migrations. Luratech's current solution is also somewhat unsatisfactory, as it achieves adherence to the format specification by modifying (if only slightly) the original data.
Earlier versions of the Luratech encoder produce a JPX file if they encounter an ICC profile that doesn't adhere to the "restricted ICC" definition. As software support for JPX is so poor, there is a real risk that the ICC profiles will get lost in a future migration (even though the image data will most likely be preserved). Moreover, since the JPX file specification also limits the use of ICC profiles to the "input class", such files do not adhere to the JPX file specification either. The same applies to Adobe's implementation, although the risks are even greater for these files because of the use of an erroneous file type header field, which makes the handling of these files by current and future decoders largely unpredictable.
Resolution header fields
Grid resolution does not directly affect the rendering of an image (unlike ICC profiles). Nevertheless, it is an important image property: for digitised imagery, resolution enables us to establish the dimensions of the digitised object. From a preservation point of view, the main risk that results from the current situation with JP2's resolution header fields is that resolution information may be lost in future migrations (see also Figure 1). For instance, a (future) decoder that expects grid resolution to be stored in the "capture" fields and ignores the "default display" fields will not be able to establish any meaningful resolution information from images that were created using current versions of Kakadu. Some tools will internally substitute the missing resolution fields with default values. For instance, if Adobe Photoshop cannot find the "capture resolution" fields, it assumes a default value of 72 ppi. If such files are subsequently re-saved, it will actually write this (entirely fictional) value to the resolution fields of the created file. Other tools may behave in a similar way, which introduces the risk that resolution information may change after a migration. Also, none of the existing encoders appear to follow the (strict) definitions of these fields in the file specification. The file specification allows the use of both sets of fields in one file. Although I am not aware of any existing applications that actually do this, the correct interpretation of the resolution information would get very confusing in that case.
Way forward for ICC profile and resolution issues
Although the issues I reported here are relatively minor, they can have major consequences within a preservation context. However, both the ICC and the resolution issues could be largely fixed by making some small changes to the JP2 file specification. Regarding the ICC issue, the JPEG committee is already working on a proposal for extending the support of ICC profiles in JP2, and bringing it in line with the latest ICC specification. This would involve removing the "input" restriction in the "Restricted ICC" method, which would allow the use of "Display Device" profiles (Robert Buckley, personal communication). (The "Output Device" class would still be prohibited in that case, since it always uses N-component LUT-based profiles.)
As for the resolution issue, the solution may be as simple as slightly expanding the definition of "capture resolution". As explained before, the current definition only covers analog-to-digital capture processes. However, both the rasterisation of a vector drawing (born-digital material) and the resampling of an existing image can be seen as digital-to-digital capture processes. Hence, a possible solution would be to include such cases in the definition of "capture resolution", which could then be generalised as "the grid resolution at which the source was captured to create the image samples specified by the codestream". This updated definition should then be illustrated using examples of both analog-to-digital and digital-to-digital capture processes. This would make these fields consistent with their de facto use by most existing encoders (as shown by Table 2). It would also ensure backward compatibility for existing files as they are produced by most encoders (except Kakadu, and some products that are based on the Kakadu libraries). The definition of "default display resolution" could either be made more specific, or, alternatively, these fields could be deprecated altogether.
In addition to these changes in the file specification, software vendors should be encouraged to produce encoders that are compliant with the (corrected) standard. The cultural heritage community could play an important role here by insisting on using software that is standards-compliant.
Interim recommendations for existing collections
In the previous section I suggested a way forward, which requires actions from the standards body and the software industry. In the meantime, institutions that are currently using JP2 as a preservation format may take a number of steps to mitigate any future risks. For existing collections, it is essential that any features that may imply a risk are both known and documented. This documentation should at least answer the following questions:
Apart from the last one, all the above questions can be answered using freely available software tools. Particularly useful in this respect are ExifTool (Harvey) and JHOVE (JHOVE). Both tools are capable of giving the required information on file format and resolution fields. JHOVE does not give any direct information about embedded ICC profiles; however, it will tell whether an image makes use of the "Restricted" or "Any ICC" methods. On the other hand, ExifTool provides detailed information about embedded ICC profiles, but it doesn't tell what method was used. So both tools complement each other here.
The resulting documentation will be helpful for making a realistic assessment of long-term risks. It may also be a starting point for planning a medium-term preservation action, such as the normalisation to standards-compliant (i.e. compliant to an updated version of the standard) JP2 images. However, in the latter case one should be aware that such a normalisation procedure by itself introduces further risks of information loss. If done thoughtlessly, the long-term outcome may be worse than doing nothing at all.
For new and ongoing digitisation projects, the most sensible interim recommendations would be to stick to the JP2 format whenever possible, avoid JPX, embed ICC profiles using the "Restricted" method, and avoid multiple ICC profile versions. In addition, the aforementioned recommendations for existing collections all apply here as well.
In this paper I showed that the current JP2 format specification leaves room for multiple interpretations when it comes to the support of ICC profiles, and the handling of grid resolution information. This has lead to a situation where different software vendors are implementing these features in different ways. In the case of ICC profiles, a strict interpretation of the standard even completely prohibits the use of ICC profiles for defining working colour spaces, which would make the format unsuitable for any applications that require colour support beyond the sRGB colour space. For preservation, this results in a number of risks, because images may not be rendered properly by future viewers, and colour space and resolution information may be lost in future migrations.
These issues could be remedied by some small adjustments of JP2's format specification, which would create minimal backward compatibility problems, if any at all. For the ICC profile issue, a proposal for such an adjustment is already under way from the JPEG committee, and I have suggested a possible solution for the resolution issue here. In addition, it would be necessary that software vendors adhere to the modified standard. Small as they may be, such changes could significantly improve the suitability and acceptance of JP2 as a preservation format.
I would like to thank Hans van Dormolen (KB) for sharing his observations on various problems related to the handling of ICC profiles and grid resolution. This ultimately served as the impetus for much of the research presented here. Thanks are also due to Christy Henshaw, Laurie Auchterlonie and Ben Gilbert (Wellcome Library) for providing the Luratech 2.1.22 test images. Wouter Kool (KB) is thanked for providing the Luratech 2.1.20 test images. Jack Holm (International Color Consortium) and Axel Rehse (LuraTech Imaging GmbH) are thanked for their helpful comments and suggestions on the "input"-"display" profile issue. Thomas Richter (Accusoft Pegasus) and Scott Houchin (Aerospace Corporation) are thanked for sharing their thoughts on the capture resolution issue, which guided me to towards the current proposed solution. Robert Buckley (Rob Buckley Consulting), Richard Clark (Elysium Ltd) and Barbara Sierman (KB) are all thanked for their feedback on an earlier draft of this paper.
[n1] The Luratech software also does this for JPX files, which means it is standards-compliant for both formats.
[n2] Although the Adobe plugin produces files that contain features which are only allowed in JPX, it assigns an erroneous value to the "Brand" header field that uniquely identifies a JPEG 2000 file as either JP2 or JPX. As a result, these files are neither valid JP2 nor JPX. Moreover, any file identification tools that are based on byte signatures ("magic numbers") will identify these files as JP2, even though the real format is JPX.
 Adobe. Adobe RGB (1998) Color Image Encoding Version 2005-05. San Jose: Adobe Systems Inc., 2005. 29 Dec 2010 http://www.adobe.com/digitalimag/pdfs/AdobeRGB1998.pdf.
 Brown, A. Digital Preservation Guidance Note 1: Selecting file formats for long-term preservation. London: The National Archives, 2008. 5 Jan 2011 http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf.
 Buckley, R. & Sam, R. JPEG 2000 Profile for the National Digital Newspaper Program. Washington: Library of Congress Office of Strategic Initiatives, 2006. 27 Dec 2010 http://www.loc.gov/ndnp/guidelines/docs/NDNP_JP2HistNewsProfile.pdf.
 ECI. eciRGB_v2 - the update of eciRGB 1.0 - Background information. European Color Initiative, 2007. 29 Dec 2010 http://www.eci.org/doku.php?id=en:colourstandards:workingcolorspaces.
 Gillesse, R., Rog, J. & Verheusen, A. Alternative File Formats for Storing Master Images of Digitisation Projects. Den Haag: Koninklijke Bibliotheek, 2008. 27 Dec 2010 http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/Alternative_File_Formats_for_Storing_Masters_2_1.pdf.
 Henshaw, C. We need how much storage? London: Wellcome Library, 2010a. 27 Dec 2010 http://jpeg2000wellcomelibrary.blogspot.com/2010/06/we-need-how-much-storage.html.
 Henshaw, C. Finding a JPEG 2000 conversion tool. London: Wellcome Library, 2010b. 30 Dec 2010 http://jpeg2000wellcomelibrary.blogspot.com/2010/07/finding-jpeg-2000-conversion-tool.html.
 ISO/IEC. "Information technology JPEG 2000 image coding system: Core coding system". ISO/IEC 15444-1, Second edition. Geneva: ISO/IEC, 2004a. 28 Dec 2010 http://www.jpeg.org/public/15444-1annexi.pdf ("Annex I: JP2 file format syntax" only).
 ISO/IEC. "Information technology JPEG 2000 image coding system: Extensions". ISO/IEC 15444-2, First edition. Geneva: ISO/IEC, 2004b. 28 Dec 2010 http://www.jpeg.org/public/15444-2annexm.pdf ("Annex M: JPX extended file format syntax" only).
 Kulovits, H., Rauber, A., Kugler, A., Brantl, M., Beinert, T. & Schoger, A. "From TIFF to JPEG 2000? Preservation Planning at the Bavarian State Library Using a Collection of Digitized 16th Century Printings". D-Lib Magazine 15.11/12 (2009). 27 Dec 2010 doi:10.1045/november2009-kulovits.
 LoC. "Sustainability Factors". Sustainability of Digital Formats - Planning for Library of Congress Collections. Washington: Library of Congress, 2007. 5 Jan 2011 http://www.digitalpreservation.gov/formats/sustain/sustain.shtml.
 McLeod, R. & Wheatley, P. Preservation Plan for Microsoft Update Digital Preservation Team. London: British Library, 2007. 27 Dec 2010 http://www.bl.uk/aboutus/stratpolprog/ccare/introduction/digital/digpresmicro.pdf.
 National Library of Norway. Digitization of books in the National Library methodology and lessons learned. Oslo: National Library of Norway, 2007. 27 Dec 2010 http://www.nb.no/content/download/2326/18198/version/1/file/digitizing-books_sep07.pdf.
 Vychodil, B. "JPEG2000 - Specifications for The National Library of the Czech Republic". Seminar JPEG 2000 for the Practitioner. London: Wellcome Trust, 16 Nov 2010. 27 Dec 2010 http://www.dpconline.org/component/docman/doc_download/520-jp2knov2010bedrich.
About the Author