Version v1.5.0 Jun 03, 2021

1. Format Overview

1.1. Namespace – HDMF Common

  • Description: Common data structures provided by HDMF
  • Name: hdmf-common
  • Full Name: HDMF Common
  • Version: 1.5.0
  • Authors:
    • Andrew Tritt
    • Oliver Ruebel
    • Ryan Ly
    • Ben Dichter
  • Contacts:
  • Schema:
    • doc: base data types
    • source: base.yaml
    • title: Base data types
    • doc: data types for a column-based table
    • source: table.yaml
    • title: Table data types
    • doc: data types for different types of sparse matrices
    • source: sparse.yaml
    • title: Sparse data types

2. Type Specifications

2.1. Base data types

base data types

2.1.1. Data

Overview: An abstract data type for a dataset.

2.1.2. Container

Overview: An abstract data type for a group storing collections of data and metadata. Base type for all data and metadata containers.

2.1.3. SimpleMultiContainer

Overview: A simple Container for holding onto multiple containers.

SimpleMultiContainer extends Container and includes all elements of Container with the following additions or changes.

SimpleMultiContainer
Table 2.1 Datasets, Links, and Attributes contained in <SimpleMultiContainer>
Id Type Description
<SimpleMultiContainer> Group

Top level Group for <SimpleMultiContainer>

.<Data> Dataset

Data objects held within this SimpleMultiContainer.

  • Extends: Data
  • Quantity: 0 or more
Table 2.2 Groups contained in <SimpleMultiContainer>
Id Type Description
<SimpleMultiContainer> Group

Top level Group for <SimpleMultiContainer>

.<Container> Group

Container objects held within this SimpleMultiContainer.

2.1.3.1. Groups: <Container>

Container objects held within this SimpleMultiContainer.

2.2. Table data types

data types for a column-based table

2.2.1. VectorData

Overview: An n-dimensional dataset representing a column of a DynamicTable. If used without an accompanying VectorIndex, first dimension is along the rows of the DynamicTable and each step along the first dimension is a cell of the larger table. VectorData can also be used to represent a ragged array if paired with a VectorIndex. This allows for storing arrays of varying length in a single cell of the DynamicTable by indexing into this VectorData. The first vector is at VectorData[0:VectorIndex[0]]. The second vector is at VectorData[VectorIndex[0]:VectorIndex[1]], and so on.

VectorData extends Data and includes all elements of Data with the following additions or changes.

  • Extends: Data
  • Primitive Type: Dataset
  • Dimensions: [[‘dim0’], [‘dim0’, ‘dim1’], [‘dim0’, ‘dim1’, ‘dim2’], [‘dim0’, ‘dim1’, ‘dim2’, ‘dim3’]]
  • Shape: [[None], [None, None], [None, None, None], [None, None, None, None]]
  • Inherits from: Data
  • Subtypes: DynamicTableRegion, VectorIndex
  • Source filename: table.yaml
  • Source Specification: see Section 3.3.1
Table 2.3 Datasets, Links, and Attributes contained in <VectorData>
Id Type Description
<VectorData> Dataset

Top level Dataset for <VectorData>

  • Neurodata Type: VectorData
  • Extends: Data
  • Dimensions: [[‘dim0’], [‘dim0’, ‘dim1’], [‘dim0’, ‘dim1’, ‘dim2’], [‘dim0’, ‘dim1’, ‘dim2’, ‘dim3’]]
  • Shape: [[None], [None, None], [None, None, None], [None, None, None, None]]
.description Attribute

Description of what these vectors represent.

  • Data Type: text
  • Name: description

2.2.2. VectorIndex

Overview: Used with VectorData to encode a ragged array. An array of indices into the first dimension of the target VectorData, and forming a map between the rows of a DynamicTable and the indices of the VectorData. The name of the VectorIndex is expected to be the name of the target VectorData object followed by “_index”.

VectorIndex extends VectorData and includes all elements of VectorData with the following additions or changes.

  • Extends: VectorData
  • Primitive Type: Dataset
  • Data Type: uint8
  • Dimensions: [‘num_rows’]
  • Shape: [None]
  • Inherits from: VectorData, Data
  • Source filename: table.yaml
  • Source Specification: see Section 3.3.2
Table 2.4 Datasets, Links, and Attributes contained in <VectorIndex>
Id Type Description
<VectorIndex> Dataset

Top level Dataset for <VectorIndex>

  • Neurodata Type: VectorIndex
  • Extends: VectorData
  • Data Type: uint8
  • Dimensions: [‘num_rows’]
  • Shape: [None]
.target Attribute

Reference to the target dataset that this index applies to.

  • Data Type: object reference to VectorData
  • Name: target

2.2.3. ElementIdentifiers

Overview: A list of unique identifiers for values within a dataset, e.g. rows of a DynamicTable.

ElementIdentifiers extends Data and includes all elements of Data with the following additions or changes.

  • Extends: Data
  • Primitive Type: Dataset
  • Data Type: int
  • Dimensions: [‘num_elements’]
  • Shape: [None]
  • Default Name: element_id
  • Inherits from: Data
  • Source filename: table.yaml
  • Source Specification: see Section 3.3.3

2.2.4. DynamicTableRegion

Overview: DynamicTableRegion provides a link from one table to an index or region of another. The table attribute is a link to another DynamicTable, indicating which table is referenced, and the data is int(s) indicating the row(s) (0-indexed) of the target array. DynamicTableRegion`s can be used to associate rows with repeated meta-data without data duplication. They can also be used to create hierarchical relationships between multiple `DynamicTable`s. `DynamicTableRegion objects may be paired with a VectorIndex object to create ragged references, so a single cell of a DynamicTable can reference many rows of another DynamicTable.

DynamicTableRegion extends VectorData and includes all elements of VectorData with the following additions or changes.

  • Extends: VectorData
  • Primitive Type: Dataset
  • Data Type: int
  • Dimensions: [‘num_rows’]
  • Shape: [None]
  • Inherits from: VectorData, Data
  • Source filename: table.yaml
  • Source Specification: see Section 3.3.4
DynamicTableRegion
Table 2.5 Datasets, Links, and Attributes contained in <DynamicTableRegion>
Id Type Description
<DynamicTableRegion> Dataset

Top level Dataset for <DynamicTableRegion>

  • Neurodata Type: DynamicTableRegion
  • Extends: VectorData
  • Data Type: int
  • Dimensions: [‘num_rows’]
  • Shape: [None]
.table Attribute

Reference to the DynamicTable object that this region applies to.

.description Attribute

Description of what this table region points to.

  • Data Type: text
  • Name: description

2.2.5. DynamicTable

Overview: A group containing multiple datasets that are aligned on the first dimension (Currently, this requirement if left up to APIs to check and enforce). These datasets represent different columns in the table. Apart from a column that contains unique identifiers for each row, there are no other required datasets. Users are free to add any number of custom VectorData objects (columns) here. DynamicTable also supports ragged array columns, where each element can be of a different size. To add a ragged array column, use a VectorIndex type to index the corresponding VectorData type. See documentation for VectorData and VectorIndex for more details. Unlike a compound data type, which is analogous to storing an array-of-structs, a DynamicTable can be thought of as a struct-of-arrays. This provides an alternative structure to choose from when optimizing storage for anticipated access patterns. Additionally, this type provides a way of creating a table without having to define a compound type up front. Although this convenience may be attractive, users should think carefully about how data will be accessed. DynamicTable is more appropriate for column-centric access, whereas a dataset with a compound type would be more appropriate for row-centric access. Finally, data size should also be taken into account. For small tables, performance loss may be an acceptable trade-off for the flexibility of a DynamicTable.

DynamicTable extends Container and includes all elements of Container with the following additions or changes.

DynamicTable
Table 2.6 Datasets, Links, and Attributes contained in <DynamicTable>
Id Type Description
<DynamicTable> Group

Top level Group for <DynamicTable>

.colnames Attribute

The names of the columns in this table. This should be used to specify an order to the columns.

  • Data Type: text
  • Dimensions: [‘num_columns’]
  • Shape: [None]
  • Name: colnames
.description Attribute

Description of what is in this dynamic table.

  • Data Type: text
  • Name: description
.id Dataset

Array of unique identifiers for the rows of this dynamic table.

  • Extends: ElementIdentifiers
  • Data Type: int
  • Dimensions: [‘num_rows’]
  • Shape: [None]
  • Name: id
.<VectorData> Dataset

Vector columns, including index columns, of this dynamic table.

2.2.6. AlignedDynamicTable

Overview: DynamicTable container that supports storing a collection of sub-tables. Each sub-table is a DynamicTable itself that is aligned with the main table by row index. I.e., all DynamicTables stored in this group MUST have the same number of rows. This type effectively defines a 2-level table in which the main data is stored in the main table implemented by this type and additional columns of the table are grouped into categories, with each category being represented by a separate DynamicTable stored within the group.

AlignedDynamicTable extends DynamicTable and includes all elements of DynamicTable with the following additions or changes.

AlignedDynamicTable
Table 2.7 Datasets, Links, and Attributes contained in <AlignedDynamicTable>
Id Type Description
<AlignedDynamicTable> Group

Top level Group for <AlignedDynamicTable>

.categories Attribute

The names of the categories in this AlignedDynamicTable. Each category is represented by one DynamicTable stored in the parent group. This attribute should be used to specify an order of categories and the category names must match the names of the corresponding DynamicTable in the group.

  • Data Type: text
  • Dimensions: [‘num_categories’]
  • Shape: [None]
  • Name: categories
Table 2.8 Groups contained in <AlignedDynamicTable>
Id Type Description
<AlignedDynamicTable> Group

Top level Group for <AlignedDynamicTable>

.<DynamicTable> Group

A DynamicTable representing a particular category for columns in the AlignedDynamicTable parent container. The table MUST be aligned with (i.e., have the same number of rows) as all other DynamicTables stored in the AlignedDynamicTable parent container. The name of the category is given by the name of the DynamicTable and its description by the description attribute of the DynamicTable.

2.2.6.1. Groups: <DynamicTable>

A DynamicTable representing a particular category for columns in the AlignedDynamicTable parent container. The table MUST be aligned with (i.e., have the same number of rows) as all other DynamicTables stored in the AlignedDynamicTable parent container. The name of the category is given by the name of the DynamicTable and its description by the description attribute of the DynamicTable.

2.3. Sparse data types

data types for different types of sparse matrices

2.3.1. CSRMatrix

Overview: A compressed sparse row matrix. Data are stored in the standard CSR format, where column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their corresponding values are stored in data[indptr[i]:indptr[i+1]].

CSRMatrix extends Container and includes all elements of Container with the following additions or changes.

CSRMatrix
Table 2.9 Datasets, Links, and Attributes contained in <CSRMatrix>
Id Type Description
<CSRMatrix> Group

Top level Group for <CSRMatrix>

.shape Attribute

The shape (number of rows, number of columns) of this sparse matrix.

  • Data Type: uint
  • Dimensions: [‘number of rows, number of columns’]
  • Shape: [2]
  • Name: shape
.indices Dataset

The column indices.

  • Data Type: uint
  • Dimensions: [‘number of non-zero values’]
  • Shape: [None]
  • Name: indices
.indptr Dataset

The row index pointer.

  • Data Type: uint
  • Dimensions: [‘number of rows in the matrix + 1’]
  • Shape: [None]
  • Name: indptr
.data Dataset

The non-zero values in the matrix.

  • Dimensions: [‘number of non-zero values’]
  • Shape: [None]
  • Name: data

3. Schema Sources

Source Specification: see Section 3.1

3.1. Namespace – HDMF Common

Description: see Section 1.1

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
author:
- Andrew Tritt
- Oliver Ruebel
- Ryan Ly
- Ben Dichter
contact:
- ajtritt@lbl.gov
- oruebel@lbl.gov
- rly@lbl.gov
- bdichter@lbl.gov
doc: Common data structures provided by HDMF
full_name: HDMF Common
name: hdmf-common
schema:
- doc: base data types
  source: base.yaml
  title: Base data types
- doc: data types for a column-based table
  source: table.yaml
  title: Table data types
- doc: data types for different types of sparse matrices
  source: sparse.yaml
  title: Sparse data types
version: 1.5.0

3.2. Base data types

base data types

3.2.1. Data

Description: see Section 2.1.1

YAML Specification:

1
2
data_type_def: Data
doc: An abstract data type for a dataset.

3.2.2. Container

Description: see Section 2.1.2

YAML Specification:

1
2
3
data_type_def: Container
doc: An abstract data type for a group storing collections of data and metadata. Base
  type for all data and metadata containers.

3.2.3. SimpleMultiContainer

Extends: Container

Description: see Section 2.1.3

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data_type_def: SimpleMultiContainer
data_type_inc: Container
datasets:
- data_type_inc: Data
  doc: Data objects held within this SimpleMultiContainer.
  quantity: '*'
doc: A simple Container for holding onto multiple containers.
groups:
- data_type_inc: Container
  doc: Container objects held within this SimpleMultiContainer.
  quantity: '*'

3.3. Table data types

data types for a column-based table

3.3.1. VectorData

Extends: Data

Description: see Section 2.2.1

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
attributes:
- doc: Description of what these vectors represent.
  dtype: text
  name: description
data_type_def: VectorData
data_type_inc: Data
dims:
- - dim0
- - dim0
  - dim1
- - dim0
  - dim1
  - dim2
- - dim0
  - dim1
  - dim2
  - dim3
doc: An n-dimensional dataset representing a column of a DynamicTable. If used without
  an accompanying VectorIndex, first dimension is along the rows of the DynamicTable
  and each step along the first dimension is a cell of the larger table. VectorData
  can also be used to represent a ragged array if paired with a VectorIndex. This
  allows for storing arrays of varying length in a single cell of the DynamicTable
  by indexing into this VectorData. The first vector is at VectorData[0:VectorIndex[0]].
  The second vector is at VectorData[VectorIndex[0]:VectorIndex[1]], and so on.
shape:
- - null
- - null
  - null
- - null
  - null
  - null
- - null
  - null
  - null
  - null

3.3.2. VectorIndex

Extends: VectorData

Description: see Section 2.2.2

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
attributes:
- doc: Reference to the target dataset that this index applies to.
  dtype:
    reftype: object
    target_type: VectorData
  name: target
data_type_def: VectorIndex
data_type_inc: VectorData
dims:
- num_rows
doc: Used with VectorData to encode a ragged array. An array of indices into the first
  dimension of the target VectorData, and forming a map between the rows of a DynamicTable
  and the indices of the VectorData. The name of the VectorIndex is expected to be
  the name of the target VectorData object followed by "_index".
dtype: uint8
shape:
- null

3.3.3. ElementIdentifiers

Extends: Data

Description: see Section 2.2.3

YAML Specification:

1
2
3
4
5
6
7
8
9
data_type_def: ElementIdentifiers
data_type_inc: Data
default_name: element_id
dims:
- num_elements
doc: A list of unique identifiers for values within a dataset, e.g. rows of a DynamicTable.
dtype: int
shape:
- null

3.3.4. DynamicTableRegion

Extends: VectorData

Description: see Section 2.2.4

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
attributes:
- doc: Reference to the DynamicTable object that this region applies to.
  dtype:
    reftype: object
    target_type: DynamicTable
  name: table
- doc: Description of what this table region points to.
  dtype: text
  name: description
data_type_def: DynamicTableRegion
data_type_inc: VectorData
dims:
- num_rows
doc: DynamicTableRegion provides a link from one table to an index or region of another.
  The `table` attribute is a link to another `DynamicTable`, indicating which table
  is referenced, and the data is int(s) indicating the row(s) (0-indexed) of the target
  array. `DynamicTableRegion`s can be used to associate rows with repeated meta-data
  without data duplication. They can also be used to create hierarchical relationships
  between multiple `DynamicTable`s. `DynamicTableRegion` objects may be paired with
  a `VectorIndex` object to create ragged references, so a single cell of a `DynamicTable`
  can reference many rows of another `DynamicTable`.
dtype: int
shape:
- null

3.3.5. DynamicTable

Extends: Container

Description: see Section 2.2.5

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
attributes:
- dims:
  - num_columns
  doc: The names of the columns in this table. This should be used to specify an order
    to the columns.
  dtype: text
  name: colnames
  shape:
  - null
- doc: Description of what is in this dynamic table.
  dtype: text
  name: description
data_type_def: DynamicTable
data_type_inc: Container
datasets:
- data_type_inc: ElementIdentifiers
  dims:
  - num_rows
  doc: Array of unique identifiers for the rows of this dynamic table.
  dtype: int
  name: id
  shape:
  - null
- data_type_inc: VectorData
  doc: Vector columns, including index columns, of this dynamic table.
  quantity: '*'
doc: A group containing multiple datasets that are aligned on the first dimension
  (Currently, this requirement if left up to APIs to check and enforce). These datasets
  represent different columns in the table. Apart from a column that contains unique
  identifiers for each row, there are no other required datasets. Users are free to
  add any number of custom VectorData objects (columns) here. DynamicTable also supports
  ragged array columns, where each element can be of a different size. To add a ragged
  array column, use a VectorIndex type to index the corresponding VectorData type.
  See documentation for VectorData and VectorIndex for more details. Unlike a compound
  data type, which is analogous to storing an array-of-structs, a DynamicTable can
  be thought of as a struct-of-arrays. This provides an alternative structure to choose
  from when optimizing storage for anticipated access patterns. Additionally, this
  type provides a way of creating a table without having to define a compound type
  up front. Although this convenience may be attractive, users should think carefully
  about how data will be accessed. DynamicTable is more appropriate for column-centric
  access, whereas a dataset with a compound type would be more appropriate for row-centric
  access. Finally, data size should also be taken into account. For small tables,
  performance loss may be an acceptable trade-off for the flexibility of a DynamicTable.

3.3.6. AlignedDynamicTable

Extends: DynamicTable

Description: see Section 2.2.6

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
attributes:
- dims:
  - num_categories
  doc: The names of the categories in this AlignedDynamicTable. Each category is represented
    by one DynamicTable stored in the parent group. This attribute should be used
    to specify an order of categories and the category names must match the names
    of the corresponding DynamicTable in the group.
  dtype: text
  name: categories
  shape:
  - null
data_type_def: AlignedDynamicTable
data_type_inc: DynamicTable
doc: DynamicTable container that supports storing a collection of sub-tables. Each
  sub-table is a DynamicTable itself that is aligned with the main table by row index.
  I.e., all DynamicTables stored in this group MUST have the same number of rows.
  This type effectively defines a 2-level table in which the main data is stored in
  the main table implemented by this type and additional columns of the table are
  grouped into categories, with each category being represented by a separate DynamicTable
  stored within the group.
groups:
- data_type_inc: DynamicTable
  doc: A DynamicTable representing a particular category for columns in the AlignedDynamicTable
    parent container. The table MUST be aligned with (i.e., have the same number of
    rows) as all other DynamicTables stored in the AlignedDynamicTable parent container.
    The name of the category is given by the name of the DynamicTable and its description
    by the description attribute of the DynamicTable.
  quantity: '*'

3.4. Sparse data types

data types for different types of sparse matrices

3.4.1. CSRMatrix

Extends: Container

Description: see Section 2.3.1

YAML Specification:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
attributes:
- dims:
  - number of rows, number of columns
  doc: The shape (number of rows, number of columns) of this sparse matrix.
  dtype: uint
  name: shape
  shape:
  - 2
data_type_def: CSRMatrix
data_type_inc: Container
datasets:
- dims:
  - number of non-zero values
  doc: The column indices.
  dtype: uint
  name: indices
  shape:
  - null
- dims:
  - number of rows in the matrix + 1
  doc: The row index pointer.
  dtype: uint
  name: indptr
  shape:
  - null
- dims:
  - number of non-zero values
  doc: The non-zero values in the matrix.
  name: data
  shape:
  - null
doc: A compressed sparse row matrix. Data are stored in the standard CSR format, where
  column indices for row i are stored in indices[indptr[i]:indptr[i+1]] and their
  corresponding values are stored in data[indptr[i]:indptr[i+1]].