Skip to content

KernelDatabaseFormat

Kent Knox edited this page Aug 13, 2013 · 1 revision

Kernel database file format

Common structure

  1. Header (23 bytes)
  2. Presented memory patterns information
  3. Binary kernels

Header

OffsetField nameSize
0File ID ( 'CBS' )3
3Version4
7Number of OpenCL functions4
11Binary data start8
19CRC324

Currently, the 'Version' field is equal to 1. The 'Binary data start' points to the offset that the OpenCL binary kernels begin at.

Memory pattern information

Field nameSize
Name Length4
NameVariable Length
Number of settings4
CRC324
Settings arrayVariable Length

Settings entry

OffsetField nameSize
0Data type4
4Kernel flags4
8Number of granulations4
12CRC324
16Decompositions arrayVariable Length

Supported data type identifiers

  • Float - 0x1
  • Double - 0x2
  • Float complex - 0x3
  • Double complex - 0x4

Kernel flags

These flags match to the code in the KernelExtraFlags enumeration

NameValueDescription
KEXTRA_TRANS_A0x01Matrix A is transposed
KEXTRA_CONJUGATE_A0x02Matrix A conjugated form
KEXTRA_TRANS_B0x04Matrix B is transposed
KEXTRA_CONJUGATE_B0x08Matrix B conjugated form
KEXTRA_COLUMN_MAJOR0x10Matrices are stored in column major format
KEXTRA_UPPER_TRIANG0x20Matrix A is upper triangular
KEXTRA_SIDE_RIGHT0x40Matrix A on the right
KEXTRA_SEPARATE_TAILS0x80Problem tails are processed separately or no tails
KEXTRA_BETA_ZERO0x800Beta multiplier is zero

Decomposition entry

The 'sizes' field is an array of 3 40-bytes structures which match the source code to the SubproblemDim structure, except each field has an 8-byte size. The 'Parallelism granularity' field represents the OpenCL work group and matches the PGranularity structure in the code. Every OpenCL solver can provide up to 3 kernels. The 'Kernel start offsets' and 'Kernel binary sizes' fields contain start offsets in the file and size of each such kernel. The 'Execution time' contains the best time in double precision that the computing kernel ran in.

OffsetField nameSize
0Sizes120
120Parallelism granularity16
136Kernel start offsets24
148kernel binary sizes12
160Execution time8
168CRC324

Clone this wiki locally