Dataset

class opencosmo.Dataset(handler, header, builders, unit_transformations, index)

Parameters:

handler (OpenCosmoDataHandler)
header (OpenCosmoHeader)
builders (dict[str, ColumnBuilder])
unit_transformations (dict[t.TransformationType, list[t.Transformation]])
index (DataIndex)

property cosmology: Cosmology

The cosmology of the simulation this dataset is drawn from as an astropy.cosmology.Cosmology object.

Returns:: cosmology
Return type:: astropy.cosmology.Cosmology

property dtype: str

The data type of this dataset.

Returns:: dtype
Return type:: str

property redshift: float

The redshift slice this dataset was drawn from

Returns:: redshift
Return type:: float

property simulation: SimulationParameters

The parameters of the simulation this dataset is drawn from.

Returns:: parameters
Return type:: opencosmo.parameters.SimulationParameters

property data: Table | Column

The data in the dataset. This will be an astropy.table.Table or astropy.table.Column (if there is only one column selected).

Returns:: data – The data in the dataset.
Return type:: astropy.table.Table or astropy.table.Column

filter(*masks)

Filter the dataset based on some criteria.

Parameters:: *masks (Mask) – The masks to apply to dataset, constructed with opencosmo.col()
Returns:: dataset – The new dataset with the masks applied.
Return type:: Dataset
Raises:: ValueError – If the given refers to columns that are not in the dataset, or the would return zero rows.

rows()

Iterate over the rows in the dataset. Yields for each row, with associated units. For performance it is recommended that you first select the columns you need to work with.

Yields:: row (dict) – A dictionary of values for each row in the dataset with units.
Return type:: Generator[dict[str, float | Quantity], None, None]

select(columns)

Select a subset of columns from the dataset.

Parameters:: columns (str or list[str]) – The column or columns to select.
Returns:: dataset – The new dataset with only the selected columns.
Return type:: Dataset
Raises:: ValueError – If any of the given columns are not in the dataset.

take(n, at='random')

Take some number of rows from the dataset.

Can take the first n rows, the last n rows, or n random rows depending on the value of ‘at’.

Parameters:

n (int) – The number of rows to take.
at (str) – Where to take the rows from. One of “start”, “end”, or “random”. The default is “random”.

Returns:

dataset – The new dataset with only the selected rows.

Return type:

Dataset

Raises:

ValueError – If n is negative or greater than the number of rows in the dataset, or if ‘at’ is invalid.

take_range(start, end)

Get a range of rows from the dataset.

Parameters:

start (int) – The first row to get.
end (int) – The last row to get.

Returns:

table – The table with only the rows from start to end.

Return type:

astropy.table.Table

Raises:

ValueError – If start or end are negative or greater than the length of the dataset or if end is greater than start.

with_units(convention)

Create a new dataset from this one with a different unit convention.

Parameters:: convention (str) – The unit convention to use. One of “physical”, “comoving”, “scalefree”, or “unitless”.
Returns:: dataset – The new dataset with the requested unit convention.
Return type:: Dataset

collect()

Given a dataset that was originally opend with opencosmo.open, return a dataset that is in-memory as though it was read with opencosmo.read.

This is useful if you have a very large dataset on disk, and you want to filter it down and then close the file.

For example:

import opencosmo as oc
with oc.open("path/to/file.hdf5") as file:
    ds = file.(ds["sod_halo_mass"] > 0)
    ds = ds.select(["sod_halo_mass", "sod_halo_radius"])
    ds = ds.collect()

The selected data will now be in memory, and the file will be closed.

If working in an MPI context, all ranks will recieve the same data.

Return type:: Dataset