reference, declarationdefinition
definition → references, declarations, derived classes, virtual overrides
reference to multiple definitions → definitions
unreferenced
    1
    2
    3
    4
    5
    6
    7
    8
    9
   10
   11
   12
   13
   14
   15
   16
   17
   18
   19
   20
   21
   22
   23
   24
   25
   26
   27
   28
   29
   30
   31
   32
   33
   34
   35
   36
   37
   38
   39
   40
   41
   42
   43
   44
   45
   46
   47
   48
   49
   50
   51
   52
   53
   54
   55
   56
   57
   58
   59
   60
   61
   62
   63
   64
   65
   66
   67
   68
   69
   70
   71
   72
   73
   74
   75
   76
   77
   78
   79
   80
   81
   82
   83
   84
   85
   86
   87
   88
   89
   90
   91
   92
   93
   94
   95
   96
   97
   98
   99
  100
  101
  102
  103
  104
  105
  106
  107
  108
  109
  110
  111
  112
  113
  114
  115
  116
  117
  118
  119
  120
  121
  122
  123
  124
  125
  126
  127
  128
  129
  130
  131
  132
  133
  134
  135
  136
  137
  138
  139
  140
  141
  142
  143
  144
  145
  146
  147
  148
  149
  150
  151
  152
  153
  154
  155
  156
  157
  158
  159
  160
  161
  162
  163
  164
  165
  166
  167
  168
  169
  170
  171
  172
  173
  174
  175
  176
  177
  178
  179
  180
  181
  182
  183
  184
  185
  186
  187
  188
  189
  190
  191
  192
  193
  194
  195
  196
  197
  198
  199
  200
  201
  202
  203
  204
  205
  206
  207
  208
  209
  210
  211
  212
  213
  214
  215
  216
  217
  218
  219
  220
  221
  222
  223
  224
  225
  226
  227
  228
  229
  230
  231
  232
  233
  234
  235
  236
  237
  238
  239
  240
  241
  242
  243
  244
  245
  246
  247
  248
  249
  250
  251
  252
  253
  254
  255
  256
  257
  258
  259
  260
  261
  262
  263
  264
  265
  266
  267
  268
  269
  270
  271
  272
  273
  274
  275
  276
  277
  278
  279
  280
  281
  282
  283
  284
  285
  286
  287
  288
  289
  290
  291
  292
  293
  294
  295
  296
  297
  298
  299
  300
  301
  302
  303
  304
  305
  306
  307
  308
  309
  310
  311
  312
  313
  314
=====================================
The PDB TPI and IPI Streams
=====================================

.. contents::
   :local:

.. _tpi_intro:

Introduction
============

The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
all types used in the program.  It is organized as a :ref:`header <tpi_header>`
followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`.  Types are
referenced from various streams and records throughout the PDB by their
:ref:`type index <type_indices>`.  In general, the sequence of type records
following the :ref:`header <tpi_header>` forms a topologically sorted DAG
(directed acyclic graph), which means that a type record B can only refer to
the type A if ``A.TypeIndex < B.TypeIndex``.  While there are rare cases where
this property will not hold (particularly when dealing with object files
compiled with MASM), an implementation should try very hard to make this
property hold, as it means the entire type graph can be constructed in a single
pass.

.. important::
   Type records form a topologically sorted DAG (directed acyclic graph).

.. _tpi_ipi:

TPI vs IPI Stream
=================

Recent versions of the PDB format (aka all versions covered by this document)
have 2 streams with identical layout, henceforth referred to as the TPI stream
and IPI stream.  Subsequent contents of this document describing the on-disk
format apply equally whether it is for the TPI Stream or the IPI Stream.  The
only difference between the two is in *which* CodeView records are allowed to
appear in each one, summarized by the following table:

+----------------------+---------------------+
|    TPI Stream        |    IPI Stream       |
+======================+=====================+
|  LF_POINTER          | LF_FUNC_ID          |
+----------------------+---------------------+
|  LF_MODIFIER         | LF_MFUNC_ID         |
+----------------------+---------------------+
|  LF_PROCEDURE        | LF_BUILDINFO        |
+----------------------+---------------------+
|  LF_MFUNCTION        | LF_SUBSTR_LIST      |
+----------------------+---------------------+
|  LF_LABEL            | LF_STRING_ID        |
+----------------------+---------------------+
|  LF_ARGLIST          | LF_UDT_SRC_LINE     |
+----------------------+---------------------+
|  LF_FIELDLIST        | LF_UDT_MOD_SRC_LINE |
+----------------------+---------------------+
|  LF_ARRAY            |                     |
+----------------------+---------------------+
|  LF_CLASS            |                     |
+----------------------+---------------------+
|  LF_STRUCTURE        |                     |
+----------------------+---------------------+
|  LF_INTERFACE        |                     |
+----------------------+---------------------+
|  LF_UNION            |                     |
+----------------------+---------------------+
|  LF_ENUM             |                     |
+----------------------+---------------------+
|  LF_TYPESERVER2      |                     |
+----------------------+---------------------+
|  LF_VFTABLE          |                     |
+----------------------+---------------------+
|  LF_VTSHAPE          |                     |
+----------------------+---------------------+
|  LF_BITFIELD         |                     |
+----------------------+---------------------+
|  LF_METHODLIST       |                     |
+----------------------+---------------------+
|  LF_PRECOMP          |                     |
+----------------------+---------------------+
|  LF_ENDPRECOMP       |                     |
+----------------------+---------------------+

The usage of these records is described in more detail in
:doc:`CodeView Type Records <CodeViewTypes>`.

.. _type_indices:

Type Indices
============

A type index is a 32-bit integer that uniquely identifies a type inside of an
object file's ``.debug$T`` section or a PDB file's TPI or IPI stream.  The
value of the type index for the first type record from the TPI stream is given
by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
although in practice this value is always equal to 0x1000 (4096).

Any type index with a high bit set is considered to come from the IPI stream,
although this appears to be more of a hack, and LLVM does not generate type
indices of this nature.  They can, however, be observed in Microsoft PDBs
occasionally, so one should be prepared to handle them.  Note that having the
high bit set is not a necessary condition to determine whether a type index
comes from the IPI stream, it is only sufficient.

Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
to come from the appropriate stream, and any type index less than this is a
bitmask which can be decomposed as follows:

.. code-block:: none

  .---------------------------.------.----------.
  |           Unused          | Mode |   Kind   |
  '---------------------------'------'----------'
  |+32                        |+12   |+8        |+0


- **Kind** - A value from the following enum:

.. code-block:: c++

  enum class SimpleTypeKind : uint32_t {
    None = 0x0000,          // uncharacterized type (no type)
    Void = 0x0003,          // void
    NotTranslated = 0x0007, // type not translated by cvpack
    HResult = 0x0008,       // OLE/COM HRESULT

    SignedCharacter = 0x0010,   // 8 bit signed
    UnsignedCharacter = 0x0020, // 8 bit unsigned
    NarrowCharacter = 0x0070,   // really a char
    WideCharacter = 0x0071,     // wide char
    Character16 = 0x007a,       // char16_t
    Character32 = 0x007b,       // char32_t

    SByte = 0x0068,       // 8 bit signed int
    Byte = 0x0069,        // 8 bit unsigned int
    Int16Short = 0x0011,  // 16 bit signed
    UInt16Short = 0x0021, // 16 bit unsigned
    Int16 = 0x0072,       // 16 bit signed int
    UInt16 = 0x0073,      // 16 bit unsigned int
    Int32Long = 0x0012,   // 32 bit signed
    UInt32Long = 0x0022,  // 32 bit unsigned
    Int32 = 0x0074,       // 32 bit signed int
    UInt32 = 0x0075,      // 32 bit unsigned int
    Int64Quad = 0x0013,   // 64 bit signed
    UInt64Quad = 0x0023,  // 64 bit unsigned
    Int64 = 0x0076,       // 64 bit signed int
    UInt64 = 0x0077,      // 64 bit unsigned int
    Int128Oct = 0x0014,   // 128 bit signed int
    UInt128Oct = 0x0024,  // 128 bit unsigned int
    Int128 = 0x0078,      // 128 bit signed int
    UInt128 = 0x0079,     // 128 bit unsigned int

    Float16 = 0x0046,                 // 16 bit real
    Float32 = 0x0040,                 // 32 bit real
    Float32PartialPrecision = 0x0045, // 32 bit PP real
    Float48 = 0x0044,                 // 48 bit real
    Float64 = 0x0041,                 // 64 bit real
    Float80 = 0x0042,                 // 80 bit real
    Float128 = 0x0043,                // 128 bit real

    Complex16 = 0x0056,                 // 16 bit complex
    Complex32 = 0x0050,                 // 32 bit complex
    Complex32PartialPrecision = 0x0055, // 32 bit PP complex
    Complex48 = 0x0054,                 // 48 bit complex
    Complex64 = 0x0051,                 // 64 bit complex
    Complex80 = 0x0052,                 // 80 bit complex
    Complex128 = 0x0053,                // 128 bit complex

    Boolean8 = 0x0030,   // 8 bit boolean
    Boolean16 = 0x0031,  // 16 bit boolean
    Boolean32 = 0x0032,  // 32 bit boolean
    Boolean64 = 0x0033,  // 64 bit boolean
    Boolean128 = 0x0034, // 128 bit boolean
  };

- **Mode** - A value from the following enum:

.. code-block:: c++

  enum class SimpleTypeMode : uint32_t {
    Direct = 0,        // Not a pointer
    NearPointer = 1,   // Near pointer
    FarPointer = 2,    // Far pointer
    HugePointer = 3,   // Huge pointer
    NearPointer32 = 4, // 32 bit near pointer
    FarPointer32 = 5,  // 32 bit far pointer
    NearPointer64 = 6, // 64 bit near pointer
    NearPointer128 = 7 // 128 bit near pointer
  };

Note that for pointers, the bitness is represented in the mode.  So a ``void*``
would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for
32-bits but a type index with ``Mode=NearPointer64, Kind=Void`` if built for
64-bits.

By convention, the type index for ``std::nullptr_t`` is constructed the same
way as the type index for ``void*``, but using the bitless enumeration value
``NearPointer``.

.. _tpi_header:

Stream Header
=============
At offset 0 of the TPI Stream is a header with the following layout:

.. code-block:: c++

  struct TpiStreamHeader {
    uint32_t Version;
    uint32_t HeaderSize;
    uint32_t TypeIndexBegin;
    uint32_t TypeIndexEnd;
    uint32_t TypeRecordBytes;

    uint16_t HashStreamIndex;
    uint16_t HashAuxStreamIndex;
    uint32_t HashKeySize;
    uint32_t NumHashBuckets;

    int32_t HashValueBufferOffset;
    uint32_t HashValueBufferLength;

    int32_t IndexOffsetBufferOffset;
    uint32_t IndexOffsetBufferLength;

    int32_t HashAdjBufferOffset;
    uint32_t HashAdjBufferLength;
  };

- **Version** - A value from the following enum.

.. code-block:: c++

  enum class TpiStreamVersion : uint32_t {
    V40 = 19950410,
    V41 = 19951122,
    V50 = 19961031,
    V70 = 19990903,
    V80 = 20040203,
  };

Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
``V80``, and no other values have been observed.  It is assumed that should
another value be observed, the layout described by this document may not be
accurate.

- **HeaderSize** - ``sizeof(TpiStreamHeader)``

- **TypeIndexBegin** - The numeric value of the type index representing the
  first type record in the TPI stream.  This is usually the value 0x1000 as
  type indices lower than this are reserved (see :ref:`Type Indices
  <type_indices>` for
  a discussion of reserved type indices).

- **TypeIndexEnd** - One greater than the numeric value of the type index
  representing the last type record in the TPI stream.  The total number of
  type records in the TPI stream can be computed as ``TypeIndexEnd -
  TypeIndexBegin``.

- **TypeRecordBytes** - The number of bytes of type record data following the
  header.

- **HashStreamIndex** - The index of a stream which contains a list of hashes
  for every type record.  This value may be -1, indicating that hash
  information is not present.  In practice a valid stream index is always
  observed, so any producer implementation should be prepared to emit this
  stream to ensure compatibility with tools which may expect it to be present.

- **HashAuxStreamIndex** - Presumably the index of a stream which contains a
  separate hash table, although this has not been observed in practice and it's
  unclear what it might be used for.

- **HashKeySize** - The size of a hash value (usually 4 bytes).

- **NumHashBuckets** - The number of buckets used to generate the hash values
  in the aforementioned hash streams.

- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
  the TPI Hash Stream of the list of hash values.  It should be assumed that
  there are either 0 hash values, or a number equal to the number of type
  records in the TPI stream (``TypeIndexEnd - TypeEndBegin``).  Thus, if
  ``HashBufferLength`` is not equal to ``(TypeIndexEnd - TypeEndBegin) *
  HashKeySize`` we can consider the PDB malformed.

- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
  within the TPI Hash Stream of the Type Index Offsets Buffer.  This is a list
  of pairs of uint32_t's where the first value is a :ref:`Type Index
  <type_indices>` and the second value is the offset in the type record data of
  the type with this index.  This can be used to do a binary search followed by
  a linear search to get O(log n) lookup by type index.

- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
  the TPI hash stream of a serialized hash table whose keys are the hash values
  in the hash value buffer and whose values are type indices.  This appears to
  be useful in incremental linking scenarios, so that if a type is modified an
  entry can be created mapping the old hash value to the new type index so that
  a PDB file consumer can always have the most up to date version of the type
  without forcing the incremental linker to garbage collect and update
  references that point to the old version to now point to the new version.
  The layout of this hash table is described in :doc:`HashTable`.

.. _tpi_records:

CodeView Type Record List
=========================
Following the header, there are ``TypeRecordBytes`` bytes of data that
represent a variable length array of :doc:`CodeView type records
<CodeViewTypes>`.  The number of such records (e.g. the length of the array)
can be determined by computing the value ``Header.TypeIndexEnd -
Header.TypeIndexBegin``.

O(log(n)) access is provided by way of the Type Index Offsets array (if
present) described previously.