About
Ink Serialized Format (or ISF), is a vector image format invented by Microsoft and used for their Tablet PC and Windows Mobile systems, and in MSN Messenger for exchanging hand-drawings.
Microsoft released the specifications of the format. You can find it at : http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/InkSerializedFormat(ISF)Specification.pdf
Encodings used
Multi-byte integers
Multi-byte integers are used a lot, and will therefore be discussed here. They encode either a 32 or a 64 bit value as one or more bytes by using the left-most (8th) bit for determining whether more bytes are following, and the 7 right-most bits as part of the value. This is the same encoding as used by the WBXML format, for example. MBUINT32 will be used to denote a 32-bit unsigned multi-byte integer, and MBUINT64 for the corresponding 64-bit integer. The basic algorithm for decoding such a value goes like this:
- Set a temporary value to 0. This will be used to build the decoded value.
- Set a bit-counter to 0. This will be incremented by 7 for each byte read.
- Read one byte from the stream.
- AND the byte read by 0x7F to get the 7 right-most bits, and left-shift it by the number
- of bits read so far. Finally, OR the temporary value with this.
- Increment the bit-counter by 7.
- If the left-most (8th) bit is set and we've read less than 29 (for MBUINT32) or
- 57 bits (for MBUINT64), go to step 3, elsewise, we're done.
Signed multi-byte integers
Signed multi-byte integers can also appear in the format. Signed multi-byte integers are 32 bits integers plus an additional bit for the sign. This makes them into 33 bits integers. For that purpose we need to use the function that reads MBUINT64 and not MBUINT32. The sign flag is the least significant bit in the value, so the algorithm goes like this:
- Read a MBUINT64 from the data.
- Set the sign flag to the least significant bit by doing a binary AND between the value read and '1'.
- Shift right from 1 bit the value.
- If the sign flag is set to 1, then multiply the value with -1.
Adaptive Huffman-based compression
Only the decompression will be discussed here, but as this process is loss-less this will do for now. The decompression process consists of two stages. The first stage is decompressing the huffman-compressed data, and the second stage is performing an inverse transformation on the decompressed data.
Codec and transformation
The codec and transformation is specified by the encoding scheme.
Stage 1: Huffman decompression
There are 8 default codecs used, and each of them defines an array that we'll call BitAmounts. The arrays are:
BitAmounts[0]: 0, 1, 2, 4, 6, 8, 12, 16, 24, 32 BitAmounts[1]: 0, 1, 1, 2, 4, 8, 12, 16, 24, 32 BitAmounts[2]: 0, 1, 1, 1, 2, 4, 8, 14, 22, 32 BitAmounts[3]: 0, 2, 2, 3, 5, 8, 12, 16, 24, 32 BitAmounts[4]: 0, 3, 4, 5, 8, 12, 16, 24, 32 BitAmounts[5]: 0, 4, 6, 8, 12, 16, 24, 32 BitAmounts[6]: 0, 6, 8, 12, 16, 24, 32 BitAmounts[7]: 0, 7, 8, 12, 16, 24, 32
For each you want a lookup-table of base values, hereby called HuffBases, of the same length as the BitAmounts table. This is generated like this:
- Set the first entry to 0.
- We'll need a temporary variable called base, initialized to the value 1.
For each of the entries from 1 to n, i as the current index and n being the highest index of BitAmounts, do:
Set HuffBases[i] to base.
Calculate 2 to the power of [BitAmounts[i] minus 1], and add it to base.
The decompression goes like this:
- Extract bit by bit from the compressed data until you read a 0. For each 1, increment the counter n.
- If n equals zero, you've decoded the value 0, and you're done.
If n is less than the length of the codec's BitAmounts array, do:
Read BitAmounts[n] number of bits from the stream into the value offset.
- This is a signed value having the sign-bit as the LSB, just like signed multi-byte integers.
Set value to HuffBases[n] plus offset shifted to the right by one bit (thus not copying the sign-bit).
- If offset's sign-bit (bit 1) is set, negate value.
- The decoded value is now in value, and we're done.
If n is equal to the length of the codec's BitAmounts array, it means we're decoding a 64-bit value,
- and in this case we repeat the decompression twice, starting from step 1. The first result becomes the upper 32 bits, while the second result becomes the lower 32 bits. The first result has the sign-bit in bit 1, so this is used to negate the combined value if it's set.
Stage 2: DeltaDelta inverse transform
For each decoded block of data, the transform state needs to be reset. We will need two variables to store the current state, and we will call these CurDelta and PrevDelta. Both will be set to zero in a vanilla state.
For each value decoded, hereby called Value, the inverse transform goes like this:
Calculate NewDelta as CurDelta multiplied by 2, subtract PrevDelta and add Value.
Set PrevDelta to CurDelta.
Set CurDelta to NewDelta.
The resulting value is now in NewDelta (and CurDelta).
Gorilla compression
This is a simple compression that packs the values at a fixed width. The fixed width is in each case the narrowest of the values to be compressed, and this width is specified by the encoding scheme.
Basically the decoding goes like this:
- Construct a sign-mask by taking the value 0xFFFFFFFF and left-shift it by width - 1.
- Read width bits from the stream into value.
- If value ANDed with the mask is non-zero, OR the value with the mask.
- What this means is that if the mask matched, the sign bit is set, and by ORing the value with the mask we effectively fill all the bits to the left of the sign bit with 1s, turning it into a true 32-bit signed integer.
Packet data
This is the scheme used to pack a list of signed values.
The packet data has the following structure:
- One byte containing flags, described below.
- Data, encoding depending on the flags.
If any of bit 7 or 8 of flags are set this means that the data is compressed using Adaptive Huffman-based compression, elsewise Gorilla compression is used.
Huffman
When this encoding is used, the two left-most bits of the flags byte MUST be set to 10, binary. The lower 5 bits contains a value that we'll call Index. When the 6th bit is set, Index is the index of the custom transformation to use. At this point there are no custom transformations, so this is clearly something reserved for the future. However, if this bit were to be set anytime in the future, the transformation also dictates which codec to use. When the 6th bit isn't set, this means that the default transformation is used. The default transformation is called DeltaDelta.
The codec is specified by Index, and usually specifies one of the 8 default ones. If Index is greater or equal to 8 this means that a custom codec is used. At this point there are no custom codecs, so this is also something reserved for the future.
Gorilla
If the flags byte' 6th bit is set, this means that a forward and inverse transformation is done before encoding and after decoding, respectively. It also means that two MBINT32s (signed) having transformed values are in front of the compressed block. More on this later. The 5 lower bits contains the fixed width, and is basically a number between 1 and 31, with the value 0 meaning 32 (naturally because the highest number that can be represented by 5 bits is 31, and a fixed width of 0 bits isn't very meaningful). All values are signed, and the left-most bit of the fixed width is used for this. All the decoding end needs to do is fill the bits to the left of the sign-bit with 1s, when the sign bit is set.
Property data
This is the scheme used to pack a list of values of some size.
The property data has the following structure:
- One byte containing flags, described below.
- Data, encoding depending on the flags.
If the 8th bit of flags is set this means that the data is compressed using LZ compression, elsewise Gorilla compression is used.
LZ
TODO.
Gorilla
If the flags byte' 7th bit is set, this means that the element size is 4 bytes. If it isn't, the element size is 2 bytes if the 6th bit is set, 1 byte if it isn't. When the element size is 4, the lower 6 bits specify an index, when it isn't (meaning it's 1 or 2), the lower 5 bits specify an index. The index is a number between 0 and 23 specifying the index into the table below, needed to know the number of bits per symbol and the number of trailing elements. If the index is greater than 23, BitCount should be calculated as index minus 16, and TrailCount set to 0.
Index |
BitCount |
TrailCount |
0 |
8 |
0 |
1 |
1 |
0 |
2 |
1 |
1 |
3 |
1 |
2 |
4 |
1 |
3 |
5 |
1 |
4 |
6 |
1 |
5 |
7 |
1 |
6 |
8 |
1 |
7 |
9 |
2 |
0 |
10 |
2 |
1 |
11 |
2 |
2 |
12 |
2 |
3 |
13 |
3 |
0 |
14 |
3 |
1 |
15 |
3 |
2 |
16 |
4 |
0 |
17 |
4 |
1 |
18 |
5 |
0 |
19 |
5 |
1 |
20 |
6 |
0 |
21 |
6 |
1 |
22 |
7 |
0 |
23 |
7 |
1 |
Basic structure
Name |
Type |
Description |
ISFTag |
MBUINT32 |
The ISF tag, which is 0. |
ISFSize |
MBUINT64 |
The number of bytes of content following. |
Content |
... |
One or more tags with content of variable or fixed size, depending on the tag. |
Known tags
Each tag is identified by an MBUINT32, followed by a fixed or variably sized payload, depending on the tag. Those with variably sized payload have a MBUINT64 following that specifies the number of bytes of payload data following.
ID |
Name |
Payload size |
Small description |
0 |
INK_SPACE_RECT |
? |
Root tag |
1 |
GUID_TABLE |
V |
Root tag |
2 |
DRAW_ATTRS_TABLE |
V |
Root tag |
3 |
DRAW_ATTRS_BLOCK |
V |
Root tag |
4 |
STROKE_DESC_TABLE |
V |
Root tag |
5 |
STROKE_DESC_BLOCK |
V |
Root tag |
6 |
BUTTONS |
? |
|
7 |
NO_X |
? |
|
8 |
NO_Y |
? |
|
9 |
DIDX |
1 * MBUINT32 |
Root tag |
10 |
STROKE |
V |
Root tag |
11 |
STROKE_PROPERTY_LIST |
? |
|
12 |
POINT_PROPERTY |
? |
|
13 |
SIDX |
1 * MBUINT32 |
Root tag |
14 |
COMPRESSION_HEADER |
? |
Root tag |
15 |
TRANSFORM_TABLE |
V |
Root tag |
16 |
TRANSFORM |
6 * 32bits IEEE754 floats |
Root tag |
17 |
TRANSFORM_ISOTROPIC_SCALE |
1 * 32bits IEEE754 float |
Root tag |
18 |
TRANSFORM_ANISOTROPIC_SCALE |
2 * 32bits IEEE754 floats |
Root tag |
19 |
TRANSFORM_ROTATE |
1 * MBUINT32 |
Root tag |
20 |
TRANSFORM_TRANSLATE |
2 * 32bits IEEE754 floats |
Root tag |
21 |
TRANSFORM_SCALE_AND_TRANSLATE |
4 * 32bits IEEE754 floats |
Root tag |
22 |
TRANSFORM_QUAD |
6 * UINT32 |
Root tag |
23 |
TIDX |
1 * MBUINT32 |
Root tag |
24 |
METRIC_TABLE |
V |
Root tag |
25 |
METRIC_BLOCK |
V |
Root tag |
26 |
MIDX |
1 * MBUINT32 |
Root tag |
27 |
MANTISSA |
? |
|
28 |
PERSISTENT_FORMAT |
V |
Root tag |
29 |
HIMETRIC_SIZE |
V |
Root tag |
30 |
STROKE_IDS |
V |
|
Application-defined tags
The application might add arbitrary data to an ISF document by having a GUID_TABLE tag containing a list of GUIDs, each 16 bytes long, and tags with ID >= 100, each mapping to the GUIDs in the table. So the first tag, 100, is the content for the 1st GUID, the second tag, 101, is for the second GUID, etc. The payload for each is data compressed according to the property data scheme, where the data can be anything and is only meaningful to the application.
Tag descriptions
GUID_TABLE
The GUID_TABLE tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or more GUIDs, each 16 bytes long.
Known GUIDs are:
Index |
GUID as in the stream,BR as string |
Name |
Description |
0 |
A5 CD 29 53 5B FA D2 4E BB 32 83 46 01 72 44 28BR 5329cda5-fa5b-4ed2-bb32-834601724428 |
Color |
|
1 |
0A 73 0B 5C 94 F3 61 49 A9 33 37 C4 34 F4 B7 EBBR 5c0b730a-f394-4961-a933-37c434f4b7eb |
DrawingFlags |
|
2 |
1A 5E 30 CE 08 0E E3 45 8C DC E4 0B B4 50 6F 21BR ce305e1a-0e08-45e3-8cdc-e40bb4506f21 |
IsHighlighter |
|
3 |
AF F9 2D 00 8C DD 49 49 BA 46 D6 5E 10 7D 1A 8ABR 002df9af-dd8c-4949-ba46-d65e107d1a8a |
StylusWidth |
|
4 |
CA B7 32 9D 13 12 54 4F B7 E4 C9 05 0E E1 7A 38BR 9d32b7ca-1213-4f54-b7e4-c9050ee17a38 |
StylusHeight |
|
5 |
31 C7 26 35 79 EE 88 49 B9 3E 70 D9 2F 89 07 EDBR 3526c731-ee79-4988-b93e-70d92f8907ed |
StylusTip |
|
6 |
16 BC 63 4B C4 7B D2 4F 95 DA AC FF 47 75 73 2DBR 4b63bc16-7bc4-4fd2-95da-acff4775732d |
StylusTipTransform |
|
Here is an example on how to convert thoses GUIDs into string ones:BR ce305e1a-0e08-45e3-8cdc-e40bb4506f21 = 1A 5E 30 CE 08 0E E3 45 8C DC E4 0B B4 50 6F 21BR 33221100-5544-7766-8899-AABBCCDDEEFF = 00 11 22 33 44 55 66 77 88 99 AA BB CC DD EE FF
DRAW_ATTRS_TABLE
The DRAW_ATTRS_TABLE tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or more DRAW_ATTRS_BLOCK payloads following each other, each prefixed by an MBUINT32 specifying the size of that block, in bytes.
DRAW_ATTRS_BLOCK
The DRAW_ATTRS_BLOCK tag starts with an MBUINT64 which represents the payload size. Its payload consists of one or more drawing attributes following each other.
Each drawing attribute's layout is like this:
- an MBUINT32 which represents the drawing attribute's GUID
- attribute-specific data
The default attributes, found in the table below, have the following data layout:
- an MBUINT32 containing the attribute's value
if the attribute is PenWidth or PenHeight, and there's more data available in the block,
- and the next MBUINT32 is MANTISSA, then some more data needs to be unpacked. this is similar to how non-default attributes are stored, more on this later.
Index |
Offset from KNOWN_GUID_BASE_INDEX (50) |
Name |
Description |
Default value |
0 |
17 |
PenStyle |
|
|
1 |
18 |
Color |
The color in BBGGRR format |
0 (black) |
2 |
19 |
PenWidth |
Pen width in HIMETRIC units. |
53 |
3 |
20 |
PenHeight |
Pen height in HIMETRIC units. |
53 |
4 |
21 |
PenTip |
2 values are known:BR0 : the pen tip is an ellipseBR1 : the pen tip is a rectangle |
0 |
5 |
22 |
Flags |
0x10 |
|
6 |
30 |
Transparency |
Value of the alpha channel. (Transparent is 0xFF) |
0x00 (opaque) |
7 |
37 |
IsHighlighter |
Payload is 4 bytes long. It means that the strokes drawn with that attributes should be considered as highlighters. |
|
HIMETRIC_SIZE
This tag consists of:
- One MBUINT32 that contains the tag ID (29)
- One MBUINT64 that contains the payload size
- The payload which consists of
- One Signed multibyte integer
- Another Signed multibyte integer
The two signed multibyte integers represent the HIMETRIC size. These will be used as the width and height of the image. As defined by [http://en.wikipedia.org/wiki/Himetric Wikipedia], HIMETRIC means:
- Himetric is a scaling unit similar to twips used in computing. It is one thousandth of a centimeter and is independent of the screen resolution.
I don't know yet how the "one thousandth of a centimeter" measure became a "pixel" though..
STROKE
This tag consist of:
- One MBUINT64 that contains the number of packets following.
- Packet data for GUID_X.
- Packet data for GUID_Y.
- Possibly more in some cases, investigate this further.
When there's a STROKE_DESC_BLOCK, there's often an other packet data describing pressure on the stylus. When maximum (resp. minimum) pressure is applied, the width/height of the stylus is 150% (resp. 50%) of the value of the width/height property.
GUID_X and GUID_Y
These are both a list of absolute x and y coordinates, where each element in each list maps directly to the element at the same index in the other list. These form an ordered list of points, which should be plotted by drawing lines between them (preferably bezier).
PERSISTENT_FORMAT
This tag's data consists of an an MBUINT64 that represents the payload size (which isn't even used at all...) followed by a single MBUINT32 with the persistent_format value. The meaning of this value is not yet known.
TRANSFORMATION
A Transformation is a 3x3 matrix. It is used to apply an affine transformation on points.
The transformation matrix is :
( m11 m12 m13 )BR T = ( m21 m22 m23 )BR ( 0 0 1 )
This way, if you have a point A (X0,Y0), you'll get the new point B (X1,Y1) like this :
(X1) ( m11 m12 m13 ) (X0)BR (Y1) = ( m21 m22 m23 ) * (Y0)BR (Φ ) ( 0 0 1 ) ( 1)BR
TRANSFORM
This tag consists of the six values describing the transformation matrix.
Those six values are coded in the stream in little endian as 32 bits floats (IEEE 754).
We have in order:
- m11
- m21
- m12
- m22
- m13
- m23
See also #TRANSFORMATION TRANSFORMATION to know how to apply that transformation.
TRANSFORM_ISOTROPIC_SCALE
This tag consists of the one value describing an isotropic scale transformation matrix.
This value (let's call it "A") is coded in the stream in little endian as a 32 bits float (IEEE 754).
The transformation matrix is :
( A 0 0 )BR T = ( 0 A 0 )BR ( 0 0 1 )
See also #TRANSFORMATION TRANSFORMATION to know how to apply that transformation.
TRANSFORM_ANISOTROPIC_SCALE
This tag consists of 2 values describing an anisotropic scale transformation matrix.
Those two values are coded in the stream in little endian as 32 bits floats (IEEE 754).
We have in order:
- A
- B
The transformation matrix is :
( A 0 0 )BR T = ( 0 B 0 )BR ( 0 0 1 )BR
See also #TRANSFORMATION TRANSFORMATION to know how to apply that transformation.
TRANSFORM_ROTATE
This tag consists of one MBUINT describing a rotate transformation matrix.
Let's call this value "A".
If A = 0, then there's no transformation to apply, elsewise A is one hundredth of a degree.
The transformation matrix is (with cos and sin working with radians) :
( cos(A*2*π/36000) -cos(A*2*π/36000) 0 )BR T = ( sin(A*2*π/36000) cos(A*2*π/36000) 0 )BR ( 0 0 1 )
See also #TRANSFORMATION TRANSFORMATION to know how to apply that transformation.
TRANSFORM_TRANSLATE
This tag consists of 2 values describing a translate transformation matrix.
Those two values are coded in the stream in little endian as 32 bits floats (IEEE 754).
We have in order:
- dx
- dy
The transformation matrix is :
( 0 0 dx )BR T = ( 0 0 dy )BR ( 0 0 1 )
In fact the transformation matrix shouldn't be this way since that transformation matrix don't translate !
Anyway, it how it's seems to be applied with the .Net 3.0 framework ...
The transformation matrix should be :
( 1 0 dx )BR T = ( 0 1 dy )BR ( 0 0 1 )
See also #TRANSFORMATION TRANSFORMATION to know how to apply that transformation.
TRANSFORM_SCALE_AND_TRANSLATE
This tag consists of four values describing a scale and translate transformation matrix.
Those four values are coded in the stream in little endian as 32 bits floats (IEEE 754).
We have in order:
- m11
- m22
- m13
- m23
The transformation matrix is :
( m11 0 m13 )BR T = ( 0 m22 m23 )BR ( 0 0 1 )
See also #TRANSFORMATION TRANSFORMATION to know how to apply that transformation.
TRANSFORM_TABLE
The TRANSFORM_TABLE tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or many TRANSFORMs following each other.
MIDX
This tag's data consists of a single MBUINT32 with the MIDX value. This value is the index of the METRIC_BLOCK attribute list to use for subsequent strokes.
SIDX
This tag's data consists of a single MBUINT32 with the SIDX value. This value is the index of the STROKE_DESC_TABLE list to use for subsequent strokes.
DIDX
This tag's data consists of a single MBUINT32 with the DIDX value. This value is the index of the drawing attribute list to use for subsequent strokes.
TIDX
This tag's data consists of a single MBUINT32 with the TIDX value. This value is the index of the TRANSFORM_TABLE list to use for subsequent strokes.
METRIC_BLOCK
The METRIC_BLOCK tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or many Metric Entries following each other. This tag's payload consists of the following:
- an MBUINT32 which represent the METRIC Entry GUID
- an MBUINT32 which represents the payload size of the METRIC Entry.
- The payload of the METRIC Entry.
The Entries' data consists of the following :
- A signed MBINT32
- A signed MBINT32
- An MBUINT32
- A 4 byte integer.
The entry's GUID 50 seems to represent the screen's resolution width while GUID 51 the screen's resolution height.
The entry's GUID 56 seems to represent the number of colors the screen can display.
This value (the screen width or height or number of colors) is located in the second MBINT32 of the entry's data.
The other values' signification are not known yet.
METRIC_TABLE
The METRIC_TABLE tag consists of an MBUINT64 which represents the payload size. Its payload consists of one or many Metric Blocks following each other. The payload consists of :
- an MBUINT32 which represents the payload size of the current Metric Block
- a payload with the exact same layout and data as the METRIC_BLOCK payload.
The METRIC_TABLE tag is actually a list of Metric blocks... each one preceded by its size in bytes.
Credits
Credits go to :
Ole André Vadla Ravnås (a.k.a oleavr) for his initial reverse engineering
Youness Alaoui (a.k.a KaKaRoTo) for his help in the reverse engineering effort
Boris Faure (a.k.a billiob) for his reverse engineering of the transformation algorithms.
