Version v1.8.0 Jan 12, 2024

1. Format Overview

1.1. Namespace – HDMF Common

  • Description: Common data structures provided by HDMF

  • Name: hdmf-common

  • Full Name: HDMF Common

  • Version: 1.8.0

  • Authors:
    • Andrew Tritt

    • Oliver Ruebel

    • Ryan Ly

    • Ben Dichter

  • Contacts:
  • Schema:
    • doc: base data types

    • source: base.yaml

    • title: Base data types

    • doc: data types for a column-based table

    • source: table.yaml

    • title: Table data types

    • doc: data types for different types of sparse matrices

    • source: sparse.yaml

    • title: Sparse data types

1.2. Type Hierarchy

2. Type Specifications

2.1. Base data types

base data types

2.1.1. Data

Overview: An abstract data type for a dataset.

2.1.2. Container

Overview: An abstract data type for a group storing collections of data and metadata. Base type for all data and metadata containers.

2.1.3. SimpleMultiContainer

Overview: A simple Container for holding onto multiple containers.

SimpleMultiContainer extends Container and includes all elements of Container with the following additions or changes.

SimpleMultiContainer
Table 2.1 Datasets, Links, and Attributes contained in <SimpleMultiContainer>

Id

Type

Description

<SimpleMultiContainer>

Group

Top level Group for <SimpleMultiContainer>

.<Data>

Dataset

Data objects held within this SimpleMultiContainer.

  • Extends: Data

  • Quantity: 0 or more

Table 2.2 Groups contained in <SimpleMultiContainer>

Id

Type

Description

<SimpleMultiContainer>

Group

Top level Group for <SimpleMultiContainer>

.<Container>

Group

Container objects held within this SimpleMultiContainer.

2.1.3.1. Groups: <Container>

Container objects held within this SimpleMultiContainer.

2.2. Table data types

data types for a column-based table

2.2.1. VectorData

Overview: An n-dimensional dataset representing a column of a DynamicTable. If used without an accompanying VectorIndex, first dimension is along the rows of the DynamicTable and each step along the first dimension is a cell of the larger table. VectorData can also be used to represent a ragged array if paired with a VectorIndex. This allows for storing arrays of varying length in a single cell of the DynamicTable by indexing into this VectorData. The first vector is at VectorData[0:VectorIndex[0]]. The second vector is at VectorData[VectorIndex[0]:VectorIndex[1]], and so on.

VectorData extends Data and includes all elements of Data with the following additions or changes.

  • Extends: Data

  • Primitive Type: Dataset

  • Dimensions: [[‘dim0’], [‘dim0’, ‘dim1’], [‘dim0’, ‘dim1’, ‘dim2’], [‘dim0’, ‘dim1’, ‘dim2’, ‘dim3’]]

  • Shape: [[None], [None, None], [None, None, None], [None, None, None, None]]

  • Inherits from: Data

  • Subtypes: VectorIndex, DynamicTableRegion

  • Source filename: table.yaml

  • Source Specification: see Section 3.3.1

Table 2.3 Datasets, Links, and Attributes contained in <VectorData>

Id

Type

Description

<VectorData>

Dataset

Top level Dataset for <VectorData>

  • Neurodata Type: VectorData

  • Extends: Data

  • Dimensions: [[‘dim0’], [‘dim0’, ‘dim1’], [‘dim0’, ‘dim1’, ‘dim2’], [‘dim0’, ‘dim1’, ‘dim2’, ‘dim3’]]

  • Shape: [[None], [None, None], [None, None, None], [None, None, None, None]]

.description

Attribute

Description of what these vectors represent.

  • Data Type: text

  • Name: description

2.2.2. VectorIndex

Overview: Used with VectorData to encode a ragged array. An array of indices into the first dimension of the target VectorData, and forming a map between the rows of a DynamicTable and the indices of the VectorData. The name of the VectorIndex is expected to be the name of the target VectorData object followed by “_index”.

VectorIndex extends VectorData and includes all elements of VectorData with the following additions or changes.

  • Extends: VectorData

  • Primitive Type: Dataset

  • Data Type: uint8

  • Dimensions: [‘num_rows’]

  • Shape: [None]

  • Inherits from: VectorData, Data

  • Source filename: table.yaml

  • Source Specification: see Section 3.3.2

Table 2.4 Datasets, Links, and Attributes contained in <VectorIndex>

Id

Type

Description

<VectorIndex>

Dataset

Top level Dataset for <VectorIndex>

  • Neurodata Type: VectorIndex

  • Extends: VectorData

  • Data Type: uint8

  • Dimensions: [‘num_rows’]

  • Shape: [None]

.target

Attribute

Reference to the target dataset that this index applies to.

  • Data Type: object reference to VectorData

  • Name: target

2.2.3. ElementIdentifiers

Overview: A list of unique identifiers for values within a dataset, e.g. rows of a DynamicTable.

ElementIdentifiers extends Data and includes all elements of Data with the following additions or changes.

  • Extends: Data

  • Primitive Type: Dataset

  • Data Type: int

  • Dimensions: [‘num_elements’]

  • Shape: [None]

  • Default Name: element_id

  • Inherits from: Data

  • Source filename: table.yaml

  • Source Specification: see Section 3.3.3

2.2.4. DynamicTableRegion

Overview: DynamicTableRegion provides a link from one table to an index or region of another. The table attribute is a link to another DynamicTable, indicating which table is referenced, and the data is int(s) indicating the row(s) (0-indexed) of the target array. DynamicTableRegion`s can be used to associate rows with repeated meta-data without data duplication. They can also be used to create hierarchical relationships between multiple `DynamicTable`s. `DynamicTableRegion objects may be paired with a VectorIndex object to create ragged references, so a single cell of a DynamicTable can reference many rows of another DynamicTable.

DynamicTableRegion extends VectorData and includes all elements of VectorData with the following additions or changes.

  • Extends: VectorData

  • Primitive Type: Dataset

  • Data Type: int

  • Dimensions: [‘num_rows’]

  • Shape: [None]

  • Inherits from: VectorData, Data

  • Source filename: table.yaml

  • Source Specification: see Section 3.3.4

DynamicTableRegion
Table 2.5 Datasets, Links, and Attributes contained in <DynamicTableRegion>

Id

Type

Description

<DynamicTableRegion>

Dataset

Top level Dataset for <DynamicTableRegion>

  • Neurodata Type: DynamicTableRegion

  • Extends: VectorData

  • Data Type: int

  • Dimensions: [‘num_rows’]

  • Shape: [None]

.table

Attribute

Reference to the DynamicTable object that this region applies to.

.description

Attribute

Description of what this table region points to.

  • Data Type: text

  • Name: description

2.2.5. DynamicTable

Overview: A group containing multiple datasets that are aligned on the first dimension (Currently, this requirement if left up to APIs to check and enforce). These datasets represent different columns in the table. Apart from a column that contains unique identifiers for each row, there are no other required datasets. Users are free to add any number of custom VectorData objects (columns) here. DynamicTable also supports ragged array columns, where each element can be of a different size. To add a ragged array column, use a VectorIndex type to index the corresponding VectorData type. See documentation for VectorData and VectorIndex for more details. Unlike a compound data type, which is analogous to storing an array-of-structs, a DynamicTable can be thought of as a struct-of-arrays. This provides an alternative structure to choose from when optimizing storage for anticipated access patterns. Additionally, this type provides a way of creating a table without having to define a compound type up front. Although this convenience may be attractive, users should think carefully about how data will be accessed. DynamicTable is more appropriate for column-centric access, whereas a dataset with a compound type would be more appropriate for row-centric access. Finally, data size should also be taken into account. For small tables, performance loss may be an acceptable trade-off for the flexibility of a DynamicTable.

DynamicTable extends Container and includes all elements of Container with the following additions or changes.

DynamicTable
Table 2.6 Datasets, Links, and Attributes contained in <DynamicTable>

Id

Type

Description

<DynamicTable>

Group

Top level Group for <DynamicTable>

.colnames

Attribute

The names of the columns in this table. This should be used to specify an order to the columns.

  • Data Type: text

  • Dimensions: [‘num_columns’]

  • Shape: [None]

  • Name: colnames

.description

Attribute

Description of what is in this dynamic table.

  • Data Type: text

  • Name: description

.id

Dataset

Array of unique identifiers for the rows of this dynamic table.

  • Extends: ElementIdentifiers

  • Data Type: int

  • Dimensions: [‘num_rows’]

  • Shape: [None]

  • Name: id

.<VectorData>

Dataset

Vector columns, including index columns, of this dynamic table.

2.2.6. AlignedDynamicTable

Overview: DynamicTable container that supports storing a collection of sub-tables. Each sub-table is a DynamicTable itself that is aligned with the main table by row index. I.e., all DynamicTables stored in this group MUST have the same number of rows. This type effectively defines a 2-level table in which the main data is stored in the main table implemented by this type and additional columns of the table are grouped into categories, with each category being represented by a separate DynamicTable stored within the group.

AlignedDynamicTable extends DynamicTable and includes all elements of DynamicTable with the following additions or changes.

AlignedDynamicTable
Table 2.7 Datasets, Links, and Attributes contained in <AlignedDynamicTable>

Id

Type

Description

<AlignedDynamicTable>

Group

Top level Group for <AlignedDynamicTable>

.categories

Attribute

The names of the categories in this AlignedDynamicTable. Each category is represented by one DynamicTable stored in the parent group. This attribute should be used to specify an order of categories and the category names must match the names of the corresponding DynamicTable in the group.

  • Data Type: text

  • Dimensions: [‘num_categories’]

  • Shape: [None]

  • Name: categories

Table 2.8 Groups contained in <AlignedDynamicTable>

Id

Type

Description

<AlignedDynamicTable>

Group

Top level Group for <AlignedDynamicTable>

.<DynamicTable>

Group

A DynamicTable representing a particular category for columns in the AlignedDynamicTable parent container. The table MUST be aligned with (i.e., have the same number of rows) as all other DynamicTables stored in the AlignedDynamicTable parent container. The name of the category is given by the name of the DynamicTable and its description by the description attribute of the DynamicTable.

2.2.6.1. Groups: <DynamicTable>

A DynamicTable representing a particular category for columns in the AlignedDynamicTable parent container. The table MUST be aligned with (i.e., have the same number of rows) as all other DynamicTables stored in the AlignedDynamicTable parent container. The name of the category is given by the name of the DynamicTable and its description by the description attribute of the DynamicTable.

2.3. Sparse data types

data types for different types of sparse matrices

2.3.1. CSRMatrix

Overview: A compressed sparse row matrix. Data are stored in the standard CSR format, where column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]].

CSRMatrix extends Container and includes all elements of Container with the following additions or changes.

CSRMatrix
Table 2.9 Datasets, Links, and Attributes contained in <CSRMatrix>

Id

Type

Description

<CSRMatrix>

Group

Top level Group for <CSRMatrix>

.shape

Attribute

The shape (number of rows, number of columns) of this sparse matrix.

  • Data Type: uint

  • Dimensions: [‘number of rows, number of columns’]

  • Shape: [2]

  • Name: shape

.indices

Dataset

The column indices.

  • Data Type: uint

  • Dimensions: [‘number of non-zero values’]

  • Shape: [None]

  • Name: indices

.indptr

Dataset

The row index pointer.

  • Data Type: uint

  • Dimensions: [‘number of rows in the matrix + 1’]

  • Shape: [None]

  • Name: indptr

.data

Dataset

The non-zero values in the matrix.

  • Dimensions: [‘number of non-zero values’]

  • Shape: [None]

  • Name: data

3. Schema Sources

Source Specification: see Section 3.1

3.1. Namespace – HDMF Common

Description: see Section 1.1

YAML Specification:

 1author:
 2- Andrew Tritt
 3- Oliver Ruebel
 4- Ryan Ly
 5- Ben Dichter
 6contact:
 7- ajtritt@lbl.gov
 8- oruebel@lbl.gov
 9- rly@lbl.gov
10- bdichter@lbl.gov
11doc: Common data structures provided by HDMF
12full_name: HDMF Common
13name: hdmf-common
14schema:
15- doc: base data types
16  source: base.yaml
17  title: Base data types
18- doc: data types for a column-based table
19  source: table.yaml
20  title: Table data types
21- doc: data types for different types of sparse matrices
22  source: sparse.yaml
23  title: Sparse data types
24version: 1.8.0

3.2. Base data types

base data types

3.2.1. Data

Description: see Section 2.1.1

YAML Specification:

1data_type_def: Data
2doc: An abstract data type for a dataset.

3.2.2. Container

Description: see Section 2.1.2

YAML Specification:

1data_type_def: Container
2doc: An abstract data type for a group storing collections of data and metadata. Base
3  type for all data and metadata containers.

3.2.3. SimpleMultiContainer

Extends: Container

Description: see Section 2.1.3

YAML Specification:

 1data_type_def: SimpleMultiContainer
 2data_type_inc: Container
 3datasets:
 4- data_type_inc: Data
 5  doc: Data objects held within this SimpleMultiContainer.
 6  quantity: '*'
 7doc: A simple Container for holding onto multiple containers.
 8groups:
 9- data_type_inc: Container
10  doc: Container objects held within this SimpleMultiContainer.
11  quantity: '*'

3.3. Table data types

data types for a column-based table

3.3.1. VectorData

Extends: Data

Description: see Section 2.2.1

YAML Specification:

 1attributes:
 2- doc: Description of what these vectors represent.
 3  dtype: text
 4  name: description
 5data_type_def: VectorData
 6data_type_inc: Data
 7dims:
 8- - dim0
 9- - dim0
10  - dim1
11- - dim0
12  - dim1
13  - dim2
14- - dim0
15  - dim1
16  - dim2
17  - dim3
18doc: An n-dimensional dataset representing a column of a DynamicTable. If used without
19  an accompanying VectorIndex, first dimension is along the rows of the DynamicTable
20  and each step along the first dimension is a cell of the larger table. VectorData
21  can also be used to represent a ragged array if paired with a VectorIndex. This
22  allows for storing arrays of varying length in a single cell of the DynamicTable
23  by indexing into this VectorData. The first vector is at VectorData[0:VectorIndex[0]].
24  The second vector is at VectorData[VectorIndex[0]:VectorIndex[1]], and so on.
25shape:
26- -
27- -
28  -
29- -
30  -
31  -
32- -
33  -
34  -
35  -

3.3.2. VectorIndex

Extends: VectorData

Description: see Section 2.2.2

YAML Specification:

 1attributes:
 2- doc: Reference to the target dataset that this index applies to.
 3  dtype:
 4    reftype: object
 5    target_type: VectorData
 6  name: target
 7data_type_def: VectorIndex
 8data_type_inc: VectorData
 9dims:
10- num_rows
11doc: Used with VectorData to encode a ragged array. An array of indices into the first
12  dimension of the target VectorData, and forming a map between the rows of a DynamicTable
13  and the indices of the VectorData. The name of the VectorIndex is expected to be
14  the name of the target VectorData object followed by "_index".
15dtype: uint8
16shape:
17-

3.3.3. ElementIdentifiers

Extends: Data

Description: see Section 2.2.3

YAML Specification:

1data_type_def: ElementIdentifiers
2data_type_inc: Data
3default_name: element_id
4dims:
5- num_elements
6doc: A list of unique identifiers for values within a dataset, e.g. rows of a DynamicTable.
7dtype: int
8shape:
9-

3.3.4. DynamicTableRegion

Extends: VectorData

Description: see Section 2.2.4

YAML Specification:

 1attributes:
 2- doc: Reference to the DynamicTable object that this region applies to.
 3  dtype:
 4    reftype: object
 5    target_type: DynamicTable
 6  name: table
 7- doc: Description of what this table region points to.
 8  dtype: text
 9  name: description
10data_type_def: DynamicTableRegion
11data_type_inc: VectorData
12dims:
13- num_rows
14doc: DynamicTableRegion provides a link from one table to an index or region of another.
15  The `table` attribute is a link to another `DynamicTable`, indicating which table
16  is referenced, and the data is int(s) indicating the row(s) (0-indexed) of the target
17  array. `DynamicTableRegion`s can be used to associate rows with repeated meta-data
18  without data duplication. They can also be used to create hierarchical relationships
19  between multiple `DynamicTable`s. `DynamicTableRegion` objects may be paired with
20  a `VectorIndex` object to create ragged references, so a single cell of a `DynamicTable`
21  can reference many rows of another `DynamicTable`.
22dtype: int
23shape:
24-

3.3.5. DynamicTable

Extends: Container

Description: see Section 2.2.5

YAML Specification:

 1attributes:
 2- dims:
 3  - num_columns
 4  doc: The names of the columns in this table. This should be used to specify an order
 5    to the columns.
 6  dtype: text
 7  name: colnames
 8  shape:
 9  -
10- doc: Description of what is in this dynamic table.
11  dtype: text
12  name: description
13data_type_def: DynamicTable
14data_type_inc: Container
15datasets:
16- data_type_inc: ElementIdentifiers
17  dims:
18  - num_rows
19  doc: Array of unique identifiers for the rows of this dynamic table.
20  dtype: int
21  name: id
22  shape:
23  -
24- data_type_inc: VectorData
25  doc: Vector columns, including index columns, of this dynamic table.
26  quantity: '*'
27doc: A group containing multiple datasets that are aligned on the first dimension
28  (Currently, this requirement if left up to APIs to check and enforce). These datasets
29  represent different columns in the table. Apart from a column that contains unique
30  identifiers for each row, there are no other required datasets. Users are free to
31  add any number of custom VectorData objects (columns) here. DynamicTable also supports
32  ragged array columns, where each element can be of a different size. To add a ragged
33  array column, use a VectorIndex type to index the corresponding VectorData type.
34  See documentation for VectorData and VectorIndex for more details. Unlike a compound
35  data type, which is analogous to storing an array-of-structs, a DynamicTable can
36  be thought of as a struct-of-arrays. This provides an alternative structure to choose
37  from when optimizing storage for anticipated access patterns. Additionally, this
38  type provides a way of creating a table without having to define a compound type
39  up front. Although this convenience may be attractive, users should think carefully
40  about how data will be accessed. DynamicTable is more appropriate for column-centric
41  access, whereas a dataset with a compound type would be more appropriate for row-centric
42  access. Finally, data size should also be taken into account. For small tables,
43  performance loss may be an acceptable trade-off for the flexibility of a DynamicTable.

3.3.6. AlignedDynamicTable

Extends: DynamicTable

Description: see Section 2.2.6

YAML Specification:

 1attributes:
 2- dims:
 3  - num_categories
 4  doc: The names of the categories in this AlignedDynamicTable. Each category is represented
 5    by one DynamicTable stored in the parent group. This attribute should be used
 6    to specify an order of categories and the category names must match the names
 7    of the corresponding DynamicTable in the group.
 8  dtype: text
 9  name: categories
10  shape:
11  -
12data_type_def: AlignedDynamicTable
13data_type_inc: DynamicTable
14doc: DynamicTable container that supports storing a collection of sub-tables. Each
15  sub-table is a DynamicTable itself that is aligned with the main table by row index.
16  I.e., all DynamicTables stored in this group MUST have the same number of rows.
17  This type effectively defines a 2-level table in which the main data is stored in
18  the main table implemented by this type and additional columns of the table are
19  grouped into categories, with each category being represented by a separate DynamicTable
20  stored within the group.
21groups:
22- data_type_inc: DynamicTable
23  doc: A DynamicTable representing a particular category for columns in the AlignedDynamicTable
24    parent container. The table MUST be aligned with (i.e., have the same number of
25    rows) as all other DynamicTables stored in the AlignedDynamicTable parent container.
26    The name of the category is given by the name of the DynamicTable and its description
27    by the description attribute of the DynamicTable.
28  quantity: '*'

3.4. Sparse data types

data types for different types of sparse matrices

3.4.1. CSRMatrix

Extends: Container

Description: see Section 2.3.1

YAML Specification:

 1attributes:
 2- dims:
 3  - number of rows, number of columns
 4  doc: The shape (number of rows, number of columns) of this sparse matrix.
 5  dtype: uint
 6  name: shape
 7  shape:
 8  - 2
 9data_type_def: CSRMatrix
10data_type_inc: Container
11datasets:
12- dims:
13  - number of non-zero values
14  doc: The column indices.
15  dtype: uint
16  name: indices
17  shape:
18  -
19- dims:
20  - number of rows in the matrix + 1
21  doc: The row index pointer.
22  dtype: uint
23  name: indptr
24  shape:
25  -
26- dims:
27  - number of non-zero values
28  doc: The non-zero values in the matrix.
29  name: data
30  shape:
31  -
32doc: A compressed sparse row matrix. Data are stored in the standard CSR format, where
33  column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their
34  corresponding values are stored in data[indptr[i]:indptr[i+1]].