ISF


From SynCE-Wiki

Contents

About

ISF, or Ink Serialized Format, is a vector image format invented by Microsoft and used for their Tablet PC and Windows Mobile systems, and in MSN Messenger for exchanging hand-drawings.

Encodings used

Multi-byte integers

Multi-byte integers are used a lot, and will therefore be discussed here. They encode either a 32 or a 64 bit value as one or more bytes by using the left-most (8th) bit for determining whether more bytes are following, and the 7 right-most bits as part of the value. This is the same encoding as used by the WBXML format, for example. MBUINT32 will be used to denote a 32-bit unsigned multi-byte integer, and MBUINT64 for the corresponding 64-bit integer. The basic algorithm for decoding such a value goes like this:

1. Set a temporary value to 0. This will be used to build the decoded value.
2. Set a bit-counter to 0. This will be incremented by 7 for each byte read.
3. Read one byte from the stream.
4. AND the byte read by 0x7F to get the 7 right-most bits, and left-shift it by the number
   of bits read so far. Finally, OR the temporary value with this.
5. Increment the bit-counter by 7.
6. If the left-most (8th) bit is set and we've read less than 29 (for MBUINT32) or
   57 bits (for MBUINT64), go to step 3, elsewise, we're done.

Signed multi-byte integers

Signed multi-byte integers can also appear in the format. Signed multi-byte integers are 32 bits integers plus an additional bit for the sign. This makes them into 33 bits integers. For that purpose we need to use the function that reads MBUINT64 and not MBUINT32. The sign flag is the least significant bit in the value, so the algorithm goes like this:

1. Read a MBUINT64 from the data.
2. Set the sign flag to the least significant bit by doing a binary AND between the value read and '1'.
3. Shift right from 1 bit the value.
6. If the sign flag is set to 1, then multiply the value with -1.

Adaptive Huffman-based compression

Only the decompression will be discussed here, but as this process is loss-less this will do for now. The decompression process consists of two stages. The first stage is decompressing the huffman-compressed data, and the second stage is performing an inverse transformation on the decompressed data.

Codec and transformation

The codec and transformation is specified by the encoding scheme.

Stage 1: Huffman decompression

There are 8 default codecs used, and each of them defines an array that we'll call BitAmounts. The arrays are:

BitAmounts[0]:  0,  1,  2,  4,  6,  8, 12, 16, 24, 32
BitAmounts[1]:  0,  1,  1,  2,  4,  8, 12, 16, 24, 32
BitAmounts[2]:  0,  1,  1,  1,  2,  4,  8, 14, 22, 32
BitAmounts[3]:  0,  2,  2,  3,  5,  8, 12, 16, 24, 32
BitAmounts[4]:  0,  3,  4,  5,  8, 12, 16, 24, 32
BitAmounts[5]:  0,  4,  6,  8, 12, 16, 24, 32
BitAmounts[6]:  0,  6,  8, 12, 16, 24, 32
BitAmounts[7]:  0,  7,  8, 12, 16, 24, 32

For each you want a lookup-table of base values, hereby called HuffBases, of the same length as the BitAmounts table. This is generated like this:

1. Set the first entry to 0.
2. We'll need a temporary variable called base, initialized to the value 1.
3. For each of the entries from 1 to n, i as the current index and n being the highest index of BitAmounts, do:
 3.1. Set HuffBases[i] to base.
 3.2. Calculate 2 to the power of [BitAmounts[i] minus 1], and add it to base.

The decompression goes like this:

1. Extract bit by bit from the compressed data until you read a 0. For each 1, increment the counter n.
2. If n equals zero, you've decoded the value 0, and you're done.
3. If n is less than the length of the codec's BitAmounts array, do:
 3.1. Read BitAmounts[n] number of bits from the stream into the value offset.
      This is a signed value having the sign-bit as the LSB, just like signed multi-byte integers.
 3.2. Set value to HuffBases[n] plus offset shifted to the right by one bit (thus not copying the sign-bit).
 3.3. If offset's sign-bit (bit 1) is set, negate value.
 3.4. The decoded value is now in value, and we're done.
4. If n is equal to the length of the codec's BitAmounts array, it means we're decoding a 64-bit value,
   and in this case we repeat the decompression twice, starting from step 1. The first result becomes the
   upper 32 bits, whilst the second result becomes the lower 32 bits. The first result has the sign-bit in
   bit 1, so this is used to negate the combined value if it's set.

Stage 2: DeltaDelta inverse transform

For each decoded block of data, the transform state needs to be reset. We will need two variables to store the current state, and we will call these CurDelta and PrevDelta. Both will be set to zero in a vanilla state.

For each value decoded, hereby called Value, the inverse transform goes like this:

1. Calculate NewDelta as CurDelta multiplied by 2, subtract PrevDelta and add Value.
2. Set PrevDelta to CurDelta.
3. Set CurDelta to NewDelta.
4. The resulting value is now in NewDelta (and CurDelta).

Gorilla compression

This is a simple compression that packs the values at a fixed width. The fixed width is in each case the narrowest of the values to be compressed, and this width is specified by the encoding scheme.

Basically the decoding goes like this:

1. Construct a sign-mask by taking the value 0xFFFFFFFF and left-shift it by width - 1.
2. Read width bits from the stream into value.
3. If value ANDed with the mask is non-zero, OR the value with the mask.
   What this means is that if the mask matched, the sign bit is set, and by ORing the value
   with the mask we effectively fill all the bits to the left of the sign bit with 1s, turning
   it into a true 32-bit signed integer.

Packet data

This is the scheme used to pack a list of signed values.

The packet data has the following structure:

1. One byte containing flags, described below.
2. Data, encoding depending on the flags.

If any of bit 7 or 8 of flags are set this means that the data is compressed using Adaptive Huffman-based compression, elsewise Gorilla compression is used.

Huffman

When this encoding is used, the two left-most bits of the flags byte MUST be set to 10, binary. The lower 5 bits contains a value that we'll call Index. When the 6th bit is set, Index is the index of the custom transformation to use. At this point there are no custom transformations, so this is clearly something reserved for the future. However, if this bit were to be set anytime in the future, the transformation also dictates which codec to use. When the 6th bit isn't set, this means that the default transformation is used. The default transformation is called DeltaDelta.

The codec is specified by Index, and usually specifies one of the 8 default ones. If Index is greater or equal to 8 this means that a custom codec is used. At this point there are no custom codecs, so this is also something reserved for the future.

Gorilla

If the flags byte' 6th bit is set, this means that a forward and inverse transformation is done before encoding and after decoding, respectively. It also means that two MBINT32s (signed) having transformed values are in front of the compressed block. More on this later. The 5 lower bits contains the fixed width, and is basically a number between 1 and 31, with the value 0 meaning 32 (naturally because the highest number that can be represented by 5 bits is 31, and a fixed width of 0 bits isn't very meaningful). All values are signed, and the left-most bit of the fixed width is used for this. All the decoding end needs to do is fill the bits to the left of the sign-bit with 1s, when the sign bit is set.

Property data

This is the scheme used to pack a list of values of some size.

The property data has the following structure:

1. One byte containing flags, described below.
2. Data, encoding depending on the flags.

If the 8th bit of flags is set this means that the data is compressed using LZ compression, elsewise Gorilla compression is used.

LZ

TODO.

Gorilla

If the flags byte' 7th bit is set, this means that the element size is 4 bytes. If it isn't, the element size is 2 bytes if the 6th bit is set, 1 byte if it isn't. When the element size is 4, the lower 6 bits specify an index, when it isn't (meaning it's 1 or 2), the lower 5 bits specify an index. The index is a number between 0 and 23 specifying the index into the table below, needed to know the number of bits per symbol and the number of trailing elements. If the index is greater than 23, BitCount should be calculated as index minus 16, and TrailCount set to 0.

Index BitCount TrailCount
0 8 0
1 1 0
2 1 1
3 1 2
4 1 3
5 1 4
6 1 5
7 1 6
8 1 7
9 2 0
10 2 1
11 2 2
12 2 3
13 3 0
14 3 1
15 3 2
16 4 0
17 4 1
18 5 0
19 5 1
20 6 0
21 6 1
22 7 0
23 7 1

Basic structure

Name Type Description
ISFTag MBUINT32 The ISF tag, which is 0.
ISFSize MBUINT64 The number of bytes of content following.
Content ... One or more tags with content of variable or fixed size, depending on the tag.

Known tags

Each tag is identified by an MBUINT32, followed by a fixed or variably sized payload, depending on the tag. Those with variably sized payload have a MBUINT64 following that specifies the number of bytes of payload data following.

ID Name Payload size Small Description
0 INK_SPACE_RECT  ? Root tag
1 GUID_TABLE V Root tag
2 DRAW_ATTRS_TABLE V Root tag
3 DRAW_ATTRS_BLOCK V Root tag
4 STROKE_DESC_TABLE V Root tag
5 STROKE_DESC_BLOCK V Root tag
6 BUTTONS  ?
7 NO_X  ?
8 NO_Y  ?
9 DIDX 1 * MBUINT32 Root tag
10 STROKE V Root tag
11 STROKE_PROPERTY_LIST  ?
12 POINT_PROPERTY  ?
13 SIDX 1 * MBUINT32 Root tag
14 COMPRESSION_HEADER  ? Root tag
15 TRANSFORM_TABLE V Root tag
16 TRANSFORM 6 * UINT32 Root tag
17 TRANSFORM_ISOTROPIC_SCALE 1 * UINT32 Root tag
18 TRANSFORM_ANISOTROPIC_SCALE 2 * UINT32 Root tag
19 TRANSFORM_ROTATE 1 * MBUINT32 Root tag
20 TRANSFORM_TRANSLATE 2 * UINT32 Root tag
21 TRANSFORM_SCALE_AND_TRANSLATE 4 * UINT32 Root tag
22 TRANSFORM_QUAD 6 * UINT32 Root tag
23 TIDX 1 * MBUINT32 Root tag
24 METRIC_TABLE V Root tag
25 METRIC_BLOCK V Root tag
26 MIDX 1 * MBUINT32 Root tag
27 MANTISSA  ?
28 PERSISTENT_FORMAT V Root tag
29 HIMETRIC_SIZE V Root tag
30 STROKE_IDS V

Application-defined tags

The application might add arbitrary data to an ISF document by having a GUID_TABLE tag containing a list of GUIDs, each 16 bytes long, and tags with ID >= 100, each mapping to the GUIDs in the table. So the first tag, 100, is the content for the 1st GUID, the second tag, 101, is for the second GUID, etc. The payload for each is data compressed according to the property data scheme, where the data can be anything and is only meaningful to the application.

Tag descriptions

DRAW_ATTRS_TABLE

The DRAW_ATTRS_TABLE tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or more DRAW_ATTRS_BLOCK payloads following each other, each prefixed by an MBUINT32 specifying the size of that block, in bytes.

DRAW_ATTRS_BLOCK

The DRAW_ATTRS_BLOCK tag starts with an MBUINT64 which represents the payload size. Its payload consists of one or more drawing attributes following each other.

Each drawing attribute's layout is like this:

1. an MBUINT32 which represents the drawing attribute's GUID
2. attribute-specific data

The default attributes, found in the table below, have the following data layout:

1. an MBUINT32 containing the attribute's value
2. if the attribute is PenWidth or PenHeight, and there's more data available in the block,
   and the next MBUINT32 is MANTISSA, then some more data needs to be unpacked.
   this is similar to how non-default attributes are stored, more on this later.
Index Offset from KNOWN_GUID_BASE_INDEX (50) Name Description
0 17 PenStyle
1 18 Color The color in XXBBGGRR format, where XX is unused.
2 19 PenWidth Pen width in HIMETRIC units.
3 20 PenHeight Pen height in HIMETRIC units.
4 21 PenTip
5 22 Flags
6 30 Transparency
7 31  ?

HIMETRIC_SIZE

This tag consists of:

1. One MBUINT32 that contains the tag ID (29)
2. One MBUINT64 that contains the payload size
3. The payload which consists of 
 3.1. One Signed multibyte integer
 3.2. Another Signed multibyte integer

The two signed multibyte integers represent the HIMETRIC size. These will be used as the width and height of the image. As defined by Wikipedia, HIMETRIC means:

 Himetric is a scaling unit similar to twips used in computing. It is one thousandth of a centimeter and is independent of the screen resolution.

I don't know yet how the "one thousandth of a centimeter" measure became a "pixel" though..

STROKE

This tag consist of:

1. One MBUINT64 that contains the number of packets following.
2. Packet data for GUID_X.
3. Packet data for GUID_Y.
4. Possibly more in some cases, investigate this further.

GUID_X and GUID_Y

These are both a list of absolute x and y coordinates, where each element in each list maps directly to the element at the same index in the other list. These form an ordered list of points, which should be plotted by drawing lines between them (preferably bezier).

PERSISTENT_FORMAT

This tag's data consists of an an MBUINT64 that represents the payload size (which isn't even used at all...) followed by a single MBUINT32 with the persistant_format value. The meaning of this value is not yet known.

MIDX

This tag's data consists of a single MBUINT32 with the MIDX value. This value is the index of the METRIC_BLOCK attribute list to use for subsequent strokes.

SIDX

This tag's data consists of a single MBUINT32 with the SIDX value. This value is the index of the STROKE_DESC_TABLE list to use for subsequent strokes.

DIDX

This tag's data consists of a single MBUINT32 with the DIDX value. This value is the index of the drawing attribute list to use for subsequent strokes.

TIDX

This tag's data consists of a single MBUINT32 with the TIDX value. This value is the index of the TRANSFORM_TABLE list to use for subsequent strokes.

METRIC_BLOCK

The METRIC_BLOCK tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or many Metric Entries following each other. This tag's payload consists of the following:

1. an MBUINT32 which represent the METRIC Entry GUID
2. an MBUINT32 which represents the payload size of the METRIC Entry.
2. The payload of the METRIC Entry.

The Entries' data consists of the following :

1. A signed MBINT32
2. A signed MBINT32
3. An MBUINT32
4. A 4 byte integer.

The entry's GUID 50 seems to represent the screen's resolution width while GUID 51 the screen's resolution height. This value (the screen width or height) is located in the second MBINT32 of the entry's data. The other values' signification are not known yet.

METRIC_TABLE

The METRIC_TABLE tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or many Metric Blocks following each other. The payload consists of :

1. an MBUINT32 which represents the payload size of the current Metric Block
2. a payload with the exact same layout and data as the METRIC_BLOCK payload.

The METRIC_TABLE tag is actually a list of Metric blocks... each one preceded by its size in bytes.