Dataset
- class opencosmo.Dataset(handler, header, builders, unit_transformations, index)
- Parameters:
handler (OpenCosmoDataHandler)
header (OpenCosmoHeader)
builders (dict[str, ColumnBuilder])
unit_transformations (dict[t.TransformationType, list[t.Transformation]])
index (DataIndex)
- property cosmology: Cosmology
The cosmology of the simulation this dataset is drawn from as an astropy.cosmology.Cosmology object.
- Returns:
cosmology
- Return type:
astropy.cosmology.Cosmology
- property dtype: str
The data type of this dataset.
- Returns:
dtype
- Return type:
str
- property redshift: float
The redshift slice this dataset was drawn from
- Returns:
redshift
- Return type:
float
- property simulation: SimulationParameters
The parameters of the simulation this dataset is drawn from.
- Returns:
parameters
- Return type:
- property data: Table | Column
The data in the dataset. This will be an astropy.table.Table or astropy.table.Column (if there is only one column selected).
- Returns:
data – The data in the dataset.
- Return type:
astropy.table.Table or astropy.table.Column
- filter(*masks)
Filter the dataset based on some criteria.
- Parameters:
*masks (Mask) – The masks to apply to dataset, constructed with
opencosmo.col()- Returns:
dataset – The new dataset with the masks applied.
- Return type:
- Raises:
ValueError – If the given refers to columns that are not in the dataset, or the would return zero rows.
- rows()
Iterate over the rows in the dataset. Yields for each row, with associated units. For performance it is recommended that you first select the columns you need to work with.
- Yields:
row (dict) – A dictionary of values for each row in the dataset with units.
- Return type:
Generator[dict[str, float | Quantity], None, None]
- select(columns)
Select a subset of columns from the dataset.
- Parameters:
columns (str or list[str]) – The column or columns to select.
- Returns:
dataset – The new dataset with only the selected columns.
- Return type:
- Raises:
ValueError – If any of the given columns are not in the dataset.
- take(n, at='random')
Take some number of rows from the dataset.
Can take the first n rows, the last n rows, or n random rows depending on the value of ‘at’.
- Parameters:
n (int) – The number of rows to take.
at (str) – Where to take the rows from. One of “start”, “end”, or “random”. The default is “random”.
- Returns:
dataset – The new dataset with only the selected rows.
- Return type:
- Raises:
ValueError – If n is negative or greater than the number of rows in the dataset, or if ‘at’ is invalid.
- take_range(start, end)
Get a range of rows from the dataset.
- Parameters:
start (int) – The first row to get.
end (int) – The last row to get.
- Returns:
table – The table with only the rows from start to end.
- Return type:
astropy.table.Table
- Raises:
ValueError – If start or end are negative or greater than the length of the dataset or if end is greater than start.
- with_units(convention)
Create a new dataset from this one with a different unit convention.
- Parameters:
convention (str) – The unit convention to use. One of “physical”, “comoving”, “scalefree”, or “unitless”.
- Returns:
dataset – The new dataset with the requested unit convention.
- Return type:
- collect()
Given a dataset that was originally opend with opencosmo.open, return a dataset that is in-memory as though it was read with opencosmo.read.
This is useful if you have a very large dataset on disk, and you want to filter it down and then close the file.
For example:
import opencosmo as oc with oc.open("path/to/file.hdf5") as file: ds = file.(ds["sod_halo_mass"] > 0) ds = ds.select(["sod_halo_mass", "sod_halo_radius"]) ds = ds.collect()
The selected data will now be in memory, and the file will be closed.
If working in an MPI context, all ranks will recieve the same data.
- Return type: