Collections

Collections within a single file can always be loaded with opencosmo.open(). Collections can be treated like read-only dictionaries. Dataset names can be retrieved with keys(), the datasets can be accessed with values() or Collection[key], and iteration can be done with items().

class opencosmo.Lightcone(datasets, z_range=None, hidden=None, ordered_by=None)

A lightcone contains two or more datasets that are part of a lightcone. Typically each dataset will cover a specific redshift range. The Lightcone object hides these details, providing an API that is identical to the standard Dataset API. Additionally, the lightcone contains some convinience functions for standard operations.

Lightcones can be nested. In this case, the top level will split the datasets up by step, while the second level will split the datasets up by type. This nested scheme (at present) is used for Diffsky catalogs, which may contain both cores and synthetic cores that need to be adddressed (and more importantly, written) seperately from one another.

Parameters:
  • datasets (Mapping[Any, Dataset | Lightcone])

  • z_range (Optional[tuple[float, float]])

  • hidden (Optional[set[str]])

  • ordered_by (Optional[tuple[str, bool]])

property header: OpenCosmoHeader

The header associated with this dataset.

OpenCosmo headers generally contain information about the original data this dataset was produced from, as well as any analysis that was done along the way.

Returns:

header

Return type:

opencosmo.header.OpenCosmoHeader

property columns: list[str]

The names of the columns in this dataset.

Returns:

columns

Return type:

list[str]

property descriptions: dict[str, str | None]

Return the descriptions (if any) of the columns in this lightcone as a dictonary. Columns without a description will be included in the dictionary with a value of None

Returns:

descriptions – The column descriptions

Return type:

dict[str, str | None]

property cosmology: Cosmology

The cosmology of the simulation this dataset is drawn from as an astropy.cosmology.Cosmology object.

Returns:

cosmology

Return type:

astropy.cosmology.Cosmology

property dtype: str

The data type of this dataset.

Returns:

dtype

Return type:

str

property region: Region

The region this dataset is contained in. If no spatial queries have been performed, this will be the entire simulation box for snapshots or the full sky for lightcones

Returns:

region

Return type:

opencosmo.spatial.Region

property simulation: HaccSimulationParameters

The parameters of the simulation this dataset is drawn from.

Returns:

parameters

Return type:

opencosmo.parameters.hacc.HaccSimulationParameters

property z_range

The redshift range of this lightcone.

Returns:

z_range

Return type:

tuple[float, float]

get_data(output='astropy', unpack=False)

Get the data in this dataset as an astropy table/column or as numpy array(s). Note that a dataset does not load data from disk into memory until this function is called. As a result, you should not call this function until you have performed any transformations you plan to on the data.

You can get the data in two formats, “astropy” (the default) and “numpy”. “astropy” format will return the data as an astropy table with associated units. “numpy” will return the data as a dictionary of numpy arrays. The numpy values will be in the associated unit convention, but no actual units will be attached.

If the dataset only contains a single column, it will be returned as an astropy.table.Column or a single numpy array.

Parameters:
  • output (str, default="astropy") – The format to output the data in. Currently supported are “astropy”, “numpy”, “pandas”, “polars”, and “arrow”

  • unpack (bool)

Returns:

data – The data in this dataset.

Return type:

Table | Column | dict[str, ndarray] | ndarray

property data

Return the data in the dataset in astropy format. The value of this attribute is equivalent to the return value of Dataset.get_data("astropy").

Returns:

data – The data in the dataset.

Return type:

astropy.table.Table or astropy.table.Column

with_redshift_range(z_low, z_high)

Restrict this lightcone to a specific redshift range. Lightcone datasets will always contain a column titled “redshift.” This function is always operates on this column.

This function also updates the value in Lightcone.z_range, so you should always use it rather than filteringo n the column directly.

Parameters:
  • z_low (float)

  • z_high (float)

bound(region, select_by=None)

Restrict the dataset to some subregion. The subregion will always be evaluated in the same units as the current dataset. For example, if the dataset is in the default “comoving” unit convention, positions are always in units of comoving Mpc. However Region objects themselves do not carry units. See Regions for details of how to construct regions.

Parameters:
  • region (opencosmo.spatial.Region) – The region to query.

  • select_by (Optional[str])

Returns:

dataset – The portion of the dataset inside the selected region

Return type:

opencosmo.Dataset

Raises:
  • ValueError – If the query region does not overlap with the region this dataset resides in

  • AttributeError: – If the dataset does not contain a spatial index

Perform a search for objects within some angular distance of some given point on the sky. This is a convinience function around bound and is exactly equivalent to

region = oc.make_cone(center, radius)
ds = ds.bound(region)
Parameters:
  • center (tuple | SkyCoord) – The center of the region to search. If a tuple and no units are provided assumed to be RA and Dec in degrees.

  • radius (float | astropy.units.Quantity) – The angular radius of the region to query. If no units are provided, assumed to be degrees.

Returns:

new_lightcone – The rows in this lightcone that fall within the given region.

Return type:

opencosmo.Lightcone

evaluate(func, vectorize=False, insert=True, format='astropy', batch_size=-1, **evaluate_kwargs)

Iterate over the rows in this collection, apply func to each, and collect the result as new columns in the dataset. You may also choose to simply return thevalues instead of inserting them as a column

This function is the equivalent of with_new_columns for cases where the new column is not a simple algebraic combination of existing columns. Unlike with_new_columns, this method will evaluate the results immediately and the resulting columns will not change under unit transformations.

The function should take in arguments with the same name as the columns in this dataset that are needed for the computation, and should return a dictionary of output values. The dataset will automatically select the needed columns to avoid unnecessarily reading data from disk. The new columns will have the same names as the keys of the output dictionary See Evaluating on Datasets for more details.

If vectorize is set to True, the full columns will be pased to the dataset. Otherwise, rows will be passed to the function one at a time.

If a batch_size is set, opencosmo will pass data to your function in batches of rows. In a lightcone, batches may be smaller than the given chunk size but will never be larger. Exact batch sizes will depend on the layout of the lightcone. Setting a batch size overrides the vectorize flag.

This function behaves (mostly) identically to Dataset.evaluate

Parameters:
  • func (Callable) – The function to evaluate on the rows in the dataset.

  • format (str, default = "astropy") – The format of the data that is provided to your function. If “astropy”, will be a dictionary of astropy quantities. If “numpy”, will be a dictionary of numpy arrays. Note that this method does not support all the formats available in get_data

  • vectorize (bool, default = False) – Whether to provide the values as full columns (True) or one row at a time (False)

  • insert (bool, default = True) – If true, the data will be inserted as a column in this dataset. Otherwise the data will be returned.

  • batch_size (int)

Returns:

dataset – The new lightcone dataset with the evaluated column(s)

Return type:

Lightcone

filter(*masks, **kwargs)

Filter the dataset based on some criteria. See Querying Based on Column Values for more information.

Parameters:

*masks (Mask) – The masks to apply to dataset, constructed with opencosmo.col()

Returns:

dataset – The new dataset with the masks applied.

Return type:

Dataset

Raises:

ValueError – If the given refers to columns that are not in the dataset, or the would return zero rows.

rows()

Iterate over the rows in the dataset. Rows are returned as a dictionary For performance, it is recommended to first select the columns you need to work with.

Yields:

row (dict) – A dictionary of values for each row in the dataset with units.

Return type:

Generator[dict[str, float | u.Quantity], None, None]

select(columns)

Create a new dataset from a subset of columns in this dataset.

Parameters:

columns (str or list[str]) – The column or columns to select.

Returns:

dataset – The new dataset with only the selected columns.

Return type:

Dataset

Raises:

ValueError – If any of the given columns are not in the dataset.

drop(columns)

Produce a new dataset by dropping columns from this dataset.

Parameters:

columns (str or list[str]) – The column or columns to drop.

Returns:

dataset – The new dataset without the dropped columns

Return type:

Dataset

Raises:

ValueError – If any of the given columns are not in the dataset.

take(n, at='random')

Create a new dataset from some number of rows from this dataset.

Can take the first n rows, the last n rows, or n random rows depending on the value of ‘at’.

Parameters:
  • n (int) – The number of rows to take.

  • at (str) – Where to take the rows from. One of “start”, “end”, or “random”. The default is “random”.

Returns:

dataset – The new dataset with only the selected rows.

Return type:

Dataset

Raises:

ValueError – If n is negative or greater than the number of rows in the dataset, or if ‘at’ is invalid.

take_range(start, end)

Create a new lightcone from a row range in this lightcone. We use standard indexing conventions, so the rows included will be start -> end - 1. Because lightcones are stacked by redshift, this operation effectively takes a redshift range. If you know the exact redshift range you want, use with_redshift_range.

Parameters:
  • start (int) – The beginning of the range

  • end (int) – The end of the range

Returns:

lightcone – The lightcone with only the specified range of rows.

Return type:

opencosmo.Lightcone

Raises:

ValueError – If start or end are negative or greater than the length of the dataset or if end is greater than start.

take_rows(rows)

Take the rows of a lightcone specified by the rows argument. rows should be an array of integers.

Parameters:

rows (np.ndarray[int]) – The indices of the rows to take.

Returns:

  • dataset (The dataset with only the specified rows included)

  • Raises

  • ——-

  • ValueError – If any of the indices is less than 0 or greater than the length of the lightcone.

with_new_columns(descriptions={}, **columns)

Create a new dataset with additional columns. These new columns can be derived from columns already in the dataset, a numpy array, or an Astropy quantity array. When a column is derived from other columns, it will behave appropriately under unit transformations. See Adding Custom Columns and Dataset.with_new_columns for examples.

Parameters:
  • descriptions (str | dict[str, str], optional) – A description for the new columns. These descriptions will be accessible through Lightcone.descriptions. If a dictionary, should have keys matching the column names.

  • columns (**) – The new columns

Returns:

dataset – This dataset with the columns added

Return type:

opencosmo.Dataset

sort_by(column, invert=False)

Sort this dataset by the values in a given column. By default sorting is in ascending order (least to greatest). Pass invert = True to sort in descending order (greatest to least).

This can be used to, for example, select largest halos in a given dataset:

dataset = oc.open("haloproperties.hdf5")
dataset = dataset
            .sort_by("fof_halo_mass")
            .take(100, at="start")
Parameters:
  • column (str) – The column in the halo_properties or galaxy_properties dataset to order the collection by.

  • invert (bool, default = False) – If False (the default), ordering will be from least to greatest. Otherwise greatest to least.

Returns:

result – A new Dataset ordered by the given column.

Return type:

Dataset

with_units(convention=None, conversions={}, **columns)

Create a new lightcone from this one with a different unit convention or with certain columns converted to a different compatible unit.

Unit conversions are always performed after a change of convention, and changing conventions clears any existing unit conversions.

For more, see Working with Units.

import astropy.units as u

# this works
lc = lc.with_units(fof_halo_mass=u.kg)

# this clears the previous conversion
lc = lc.with_units("scalefree")

# This now fails, because the units of masses
# are Msun / h, which cannot be converted to kg
lc = lc.with_units(fof_halo_mass=u.kg)

# this will now work, wince the units of halo mass in the "physical"
# convention are Msun (no h).
lc = lc.with_units("physical", fof_halo_mass=u.kg, fof_halo_center_x=u.lyr)

# Suppose you want your distances in lightyears, but the x coordinate of your
# halo center in kilometers, for some reason ¯\_(ツ)_/¯
blanket_conversions = {u.Mpc: u.lyr}
lc = lc.with_units(conversions = blanket_conversions, fof_halo_center_x = u.km)
Parameters:
  • convention (str, optional) – The unit convention to use. One of “physical”, “comoving”, “scalefree”, or “unitless”.

  • conversions (dict[astropy.units.Unit, astropy.units.Unit]) – Conversions that apply to all columns in the lightcone with the unit given by the key.

  • **column_conversions (astropy.units.Unit) – Custom unit conversions for specific columns in this dataset.

  • columns (u.Unit)

Returns:

lightcone – The new lightcone with the requested unit convention and/or conversions.

Return type:

Lightcone

class opencosmo.SimulationCollection(datasets)

A collection of datasets of the same type from different simulations. In general this exposes the exact same API as the individual datasets, but maps the results across all of them.

Parameters:

datasets (Mapping[str, Dataset | Collection])

make_schema()
Return type:

Schema

property dtype: dict[str, str]
property header: dict[str, OpenCosmoHeader]
property cosmology: dict[str, Cosmology]

Get the cosmologies of the simulations in the collection

Returns:

cosmologies

Return type:

dict[str, astropy.cosmology.Cosmology]

property redshift: dict[str, float | tuple[float, float]]

Get the redshift slices or ranges for the simulations in the collection

Returns:

redshifts

Return type:

dict[str, float | tuple[float,float]]

property simulation: dict[str, HaccSimulationParameters]

Get the simulation parameters for the simulations in the collection

Returns:

simulation_parameters

Return type:

dict[str, opencosmo.parameters.HaccSimulationParameters]

bound(region, select_by=None)

Restrict the datasets to some region. Note that the SimulationCollection does not do any checking to ensure its members have identical boxes. As a result this method can in principle fail for some of the simulations in the collection and not others. This should never happen when working with official OpenCosmo data products.

See Regions for details of how to construct regions.

Parameters:
  • region (opencosmo.spatial.Region) – The region to query

  • select_by (Optional[str])

Returns:

dataset – The portion of each dataset inside the selected region

Return type:

opencosmo.SimulationCollection

filter(*masks, **kwargs)

Filter the datasets in the collection. This method behaves exactly like opencosmo.Dataset.filter() or opencosmo.StructureCollection.filter(), but it applies the filter to all the datasets or collections within this collection. The result is a new collection.

Parameters:
  • filters – The filters constructed with opencosmo.col()

  • masks (ColumnMask)

Returns:

A new collection with the same datasets, but only the particles that pass the filter.

Return type:

SimulationCollection

select(*args, **kwargs)

Select a set of columns in the datasets in this collection. This method calls the underlying method in opencosmo.Dataset, or opencosmo.Collection depending on the context. As such its behavior and arguments can vary depending on what this collection contains.

Parameters:
  • args – The arguments to pass to the select method. This is usually a list of column names to select.

  • kwargs – The keyword arguments to pass to the select method. This is usually a dictionary of column names to select.

Returns:

A new collection with only the specified columns

Return type:

SimulationCollection

drop(*args, **kwargs)

Drop a set of columns from the datasets in the collection. This method calls the underlying method in opencosmo.Dataset, or opencosmo.Collection depending on the context. As such its behavior and arguments can vary depending on what this collection contains.

Parameters:
  • args – The arguments to pass to the select method. This is usually a list of column names to drop.

  • kwargs – The keyword arguments to pass to the select method. This is usually a dictionary of column names to select.

Return type:

Self

take(n, at='random')

Take a subest of rows from all datasets or collections in this collection. This method will delegate to the underlying method in opencosmo.Dataset, or opencosmo.StructureCollection depending on the context. As such, behavior may vary depending on what this collection contains. See their documentation for more info.

Parameters:
  • n (int) – The number of rows to take

  • at (str, default = "random") – The method to use to take rows. Must be one of “start”, “end”, “random”.

Return type:

Self

take_range(start, end)

Take a range of rows from all datasets or collections in this collection. This method will fail if start < 0, or any of the datasets are not at least end long.

Parameters:
  • n (int) – The number of rows to take

  • at (str, default = "random") – The method to use to take rows. Must be one of “start”, “end”, “random”.

  • start (int)

  • end (int)

Returns:

The new simulation collection with only the specified rows.

Return type:

SimulationCollection

with_new_columns(*args, datasets=None, descriptions={}, **kwargs)

Update the datasets within this collection with a set of new columns. This method simply calls opencosmo.Dataset.with_new_columns() or opencosmo.StructureCollection.with_new_columns(), as appropriate.

You can also optionally pass the “datasets” keyword argument to specify that the operation should only be performed on a subset of the datasets.

If passing in numpy arrays or astropy quantities, they should be provided as a dictionary where the keys are the same as the keys in this dataset.

Parameters:
  • datasets (str | list[str], optional) – The datasets to add the columns to.

  • descriptions (str | dict[str, str], optional) – A description for the new columns. These descriptions will be accessible through SimulationCollection(datasets).descriptions. If a dictionary, should have keys matching the column names.

  • columns (**) – The new columns

evaluate(func, datasets=None, format='astropy', vectorize=False, insert=False, **evaluate_kwargs)

Evaluate the function func on each of the datasets or collections held by this SimulationCollection. This function simply delegates to the either StructureCollection.evaluate or Dataset.evaluate as appropriate. Refer to Evaluating Complex Expressions on Datasets and Collections for more details.

If “datasets” is provided, the evaluation will only be performed on the provided datasets.

Parameters:
  • func (Callable) – The function to evaluate

  • datasets (str | list[str], optional) – The datasets to evaluate on. If not provided, will be evaluated on all datasets

  • format (str, default = "astropy") – The format of the data that is provided to your function. If “astropy”, will be a dictionary of astropy quantities. If “numpy”, will be a dictionary of numpy arrays. Note that this method does not support all the formats available in get_data

  • vectorize (bool, default = False) – Whether to vectorize the computation. See StructureCollection.evaluate and/or Dataset.evaluate for more details.

  • insert (bool, default = True) – Whether or not to insert the results as columns in the datasets. If false, the results will be returned directly. If true, this method will return a new Simulation Collection.

Returns:

results – The results of the computation, or a new simulation collection with the results inserted.

Return type:

SimulationCollection | dict[str, np.ndarray] | dict[str, astropy.units.Quantity]

sort_by(column, invert=False)

Re-order the individual datasets in the collection based on a column. See Dataset.sort_by for usage details.

Parameters:
  • column (str) – The column in the halo_properties or galaxy_properties dataset to order the collection by.

  • invert (bool, default = False) – If False (the default) ordering will be done from least to greatest. Otherwise greatest to least.

Returns:

result – A new SimulationCollection with the datasets ordered by the given column.

Return type:

SimulationCollection

with_units(convention=None, conversions={}, **columns)

Transform all datasets or collections to use the given unit convention, convert all columns with a given unit into a different unit, and/or convert specific column(s) to a compatible unit. This method behaves exactly like opencosmo.Dataset.with_units().

Parameters:
  • convention (str) – The unit convention to use. One of “unitless”, “scalefree”, “comoving”, or “physical”.

  • conversions (dict[astropy.units.Unit, astropy.units.Unit]) – Conversions that apply to all columns in the collection with the unit given by the key.

  • **column_conversions (astropy.units.Unit) – Custom unit conversions for any column with a specific name in the datasets in this collection.

  • columns (u.Unit)

Returns:

A new simulation collection with the requested unit conventions and conversions.

Return type:

collection

class opencosmo.StructureCollection(source, header, datasets, hide_source=False, link_handler=None, derived_columns=None, **kwargs)

A collection of datasets that contain both high-level properties and lower level information (such as particles) for structures in the simulation. Currently these structures include halos and galaxies.

Every structure collection has a halo_properties or galaxy_properties dataset that contains the high-level measured attribute of the structures. Certain operations (e.g. sort_by operate on this dataset.

Parameters:
  • source (oc.Dataset)

  • header (oc.header.OpenCosmoHeader)

  • datasets (Mapping[str, oc.Dataset | StructureCollection])

  • hide_source (bool)

  • link_handler (Optional[LinkHandler])

  • derived_columns (Optional[set[str]])

property header
property dtype
property cosmology: astropy.cosmology.Cosmology

The cosmology of the structure collection

property properties: list[str]

The high-level properties that are available as part of the halo_properties or galaxy_properties dataset.

property redshift: float | tuple[float, float] | None

For snapshots, return the redshift or redshift range this dataset was drawn from.

Returns:

redshift

Return type:

float | tuple[float, float]

property simulation: HaccSimulationParameters

Get the parameters of the simulation this dataset is drawn from.

Returns:

parameters

Return type:

opencosmo.parameters.HaccSimulationParameters

keys()

Return the names of the datasets in this collection.

Return type:

list[str]

values()

Return the datasets in this collection.

Return type:

list[Dataset | StructureCollection]

items()

Return the names and datasets as key-value pairs.

Return type:

Generator[tuple[str, Dataset | StructureCollection], None, None]

property region
bound(region, select_by=None)

Restrict this collection to only contain structures in the specified region. Querying will be done based on the halo or galaxy centers, meaning some particles may fall outside the given region.

See Regions for details of how to construct regions.

Parameters:
  • region (opencosmo.spatial.Region)

  • select_by (Optional[str])

Returns:

dataset – The portion of the dataset inside the selected region

Return type:

opencosmo.Dataset

Raises:
  • ValueError – If the query region does not overlap with the region this dataset resides in

  • AttributeError: – If the dataset does not contain a spatial index

evaluate(func, dataset=None, format='astropy', insert=True, **evaluate_kwargs)

Iterate over the structures in this collection and apply func to each, collecting the results into a new column. These values will be computed immediately rather than lazily. If your new column can be created from a simple algebraic combination of existing columns, use with_new_columns.

You can substantially improve the performance of this method by specifying which data is actually needed to do the computation. This method will automatically select the requested data, avoiding reading unneeded data from disk. The semantics for specifying the columns is identical to select.

The function passed to this method must take arguments that match the names of datasets that are stored in this collection. You can specify specific columns that are needed with keyword arguments to this function. For example:

import opencosmo as oc
import numpy as np
collection = oc.open("haloproperties.hdf5", "haloparticles.hdf5")

def computation(halo_properties, dm_particles):
    dx = np.mean(dm_particles.data["x"]) - halo_properties["fof_halo_center_x"]
    dy = np.mean(dm_particles.data["y"]) - halo_properties["fof_halo_center_y"]
    dz = np.mean(dm_particles.data["z"]) - halo_properties["fof_halo_center_z"]
    offset = np.sqrt(dx**2 + dy**2 + dz**2)
    return offset / halo_properties["sod_halo_radius"]

collection = collection.evaluate(
    computation,
    name="offset",
    halo_properties=[
        "fof_halo_center_x",
        "fof_halo_center_y",
        "fof_halo_center_z"
        "sod_halo_radius"
    ],
    dm_particles=["x", "y", "z"]
)

The collection will now contain a column named “offset” with the results of the computation applied to each halo in the collection.

It is not required to pass a list of column names for a given dataset. If a list is not provided, all columns will be passed to the computation function. Data will be passed into the function as numpy arrays or astropy tables, depending on the value of the “format” argument. However if the evaluation involes a nested structure collection (e.g. a galaxy collection inside a structure collection) in addition to other datasets, the nested collection will be passed to your function as a StructureCollection.

For more details and advanced usage see Evaluating on Structure Collections

Parameters:
  • func (Callable) – The function to evaluate on the rows in the dataset.

  • dataset (Optional[str], default = None) – The dataset inside this collection to evaluate the function on. If none, assumes the function requires data from multiple datasets. You can visit a dataset inside a nested structure collection by passing the path separated by dots, for example “galaxies.star_particles”. Data will be fed to the function on a structure-by-structure basis, and the output should be the same length as the input data.

  • insert (bool, default = True) – If true, the data will be inserted as a column in the specified dataset. If no dataset is specified, insert into the “halo_properties” dataset if this collection contains halos, or the “galaxy properties” if this collection contains galaxies. If False, simply return the data.

  • format (str, default = astropy) – Whether to provide data to your function as “astropy” quantities or “numpy” arrays/scalars. Default “astropy”. Note that this method does not support all the formats available in get_data

  • **evaluate_kwargs (any,) – Any additional arguments that are required for your function to run. These will be passed directly to the function as keyword arguments. If a kwarg is an array of values with the same length as the dataset, it will be treated as an additional column.

evaluate_on_dataset(func, dataset=None, vectorize=False, format='astropy', insert=True, batch_size=-1, **evaluate_kwargs)

Evaluate an expression on a specific dataset in this collection. This method is different from calling evaulate with a dataset argument in that this method does not apply the function on a per-structure basis. It is roughtly equivalent to the following code:

results = collection[dataset_name].evaluate(func, format, vectorize, insert=False)
collection = collection.with_new_columns(dataset_name, my_computed_value = results)

Keep in mind that the following code:

collection[dataset_name].evaluate(func, format, vectorize, insert=true)

does produces a new dataset with the given new column, but this dataset will not be a part of the original collection.

Parameters:
  • func (Callable) – The function to evaluate on the rows in the dataset.

  • dataset (Optional[str] = None,) – The dataset to perform the evaluation on. If None, defaults to the halo_properties or galaxy_properties dataset.

  • vectorize (bool, default = False) – Whether to provide the values as full columns (True) or one row at a time (False). Ignored if batch_size is set.

  • format (str, default = astropy) – Whether to provide data to your function as “astropy” quantities or “numpy” arrays/scalars. Default “astropy”. Note that this method does not support all the formats available in get_data

  • insert (bool, default = True) – If true, the data will be inserted as a column in this dataset. The new column will have the same name as the function. Otherwise the data will be returned directly.

  • batch_size (int, default = -1) – If set, feed data to the function in batches of the specified size. Default is -1, which disables batching. If set to another value, the vectorize flag is ignored.

  • **evaluate_kwargs (any,) – Any additional arguments that are required for your function to run. These will be passed directly to the function as keyword arguments. If a kwarg is an array of values with the same length as the dataset, it will be treated as an additional column.

filter(*masks, on_galaxies=False)

Apply a filter to the halo or galaxy properties. Filters are constructed with opencosmo.col() and behave exactly as they would in opencosmo.Dataset.filter().

If the collection contains both halos and galaxies, the filter can be applied to the galaxy properties dataset by setting on_galaxies=True. However this will filter for halos that host galaxies that match this filter. As a result, galxies that do not match this filter will remain if another galaxy in their host halo does match.

See Querying In Collections for some examples.

Parameters:
  • *filters (Mask) – The filters to apply to the properties dataset constructed with opencosmo.col().

  • on_galaxies (bool, optional) – If True, the filter is applied to the galaxy properties dataset.

Returns:

A new collection filtered by the given masks.

Return type:

StructureCollection

Raises:

ValueError – If on_galaxies is True but the collection does not contain a galaxy properties dataset.

select(**column_selections)

Update a dataset in the collection collection to only include the columns specified. The name of the arguments to this function should be dataset names. For example:

collection = collection.select(
    halo_properties = ["fof_halo_mass", "sod_halo_mass", "sod_halo_cdelta"],
    dm_particles = ["x", "y", "z"]
)

Datasets that do not appear in the argument list will not be modified. You can remove entire datasets from the collection with with_datasets

For nested structure collections, such as galaxies within halos, you can pass a nested dictionary:

collection = oc.open("haloproperties.hdf5", "haloparticles.hdf5", "galaxyproperties.hdf5", "galaxyparticles.hdf5")

collection = collection.select(
    halo_properties = ["fof_halo_mass", "sod_halo_mass", "sod_halo_cdelta"],
    dm_particles = ["x", "y", "z"]
    galaxies = {
        "galaxy_properties": ["gal_mass_bar", "gal_mass_star"],
        "star_particles": ["x", "y", "z"]
    }
)
Parameters:
  • **column_selections (str | Iterable[str] | dict[str, Iterable[str]]) – The columns to select from a given dataset or sub-collection

  • dataset (str) – The dataset to select from.

Returns:

A new collection with only the selected columns for the specified dataset.

Return type:

StructureCollection

Raises:

ValueError – If the specified dataset is not found in the collection.

drop(**columns_to_drop)

Update the linked collection by dropping the specified columns in the specified datasets. This method follows the exact same semantics as StructureCollection.select. Argument names should be datasets in this collection, and the argument values should be a string, list of strings, or dictionary.

Datasets that are not included will not be modified. You can drop entire datasets with with_datasets

Parameters:
  • **columns_to_drop (str | Iterable[str]) – The columns to drop from the dataset.

  • dataset (str, optional) – The dataset to select from. If None, the properties dataset is used.

Returns:

A new collection with only the selected columns for the specified dataset.

Return type:

StructureCollection

Raises:

ValueError – If the specified dataset is not found in the collection.

sort_by(column, invert=False)

Re-order the collection based on one of the structure collection’s properties. Each StructureCollection contains a halo_properties or galaxy_properties dataset that contains the high-level measured properties of the structures in this collection. This method always operates on that dataset.

Parameters:
  • column (str) – The column in the halo_properties or galaxy_properties dataset to order the collection by.

  • invert (bool, default = False) – If False (the default), ordering will be from least to greatest. Otherwise greatest to least.

Returns:

result – A new StructureCollection ordered by the given column.

Return type:

StructureCollection

with_units(convention=None, conversions={}, **dataset_conversions)

Apply the given unit convention to the collection, or convert a subset of the columns in one or more of these datasets into a compatible unit.

Because this collection contains several datasets, you must specify the dataset when performing conversions. For example, the equivalent unit conversion to the final one in the example in opencosmo.Dataset.with_units() looks like this:

import astropy.units as u

structures = structures.with_units(
    "physical",
    halo_properties={"fof_halo_mass": u.kg, "fof_halo_center_x": u.ly}
)

You can use conversions to specify a conversion that applies to all columns in the collection with the given unit, or specify per-dataset conversions. Per-dataset conversions always take precedent over collection-wide conversions. For example:

import astropy.units as u

conversions = {u.Mpc: u.lyr}
structures = structures.with_units(
    conversions=conversions
    halo_properties = {
        "conversions": {u.Mpc: u.km},
        "fof_halo_center_x": u.m
    }
)

In this example, all values in Mpc will be converted to lightyears, except in the “halo_properties” dataset, where they will be converted to kilometers. The column “fof_halo_center_x” in “halo_properties” will be converted to meters instead.

For more information, see Working with Units

Parameters:
  • convention (str) – The unit convention to apply. One of “unitless”, “scalefree”, “comoving”, or “physical”.

  • conversions (dict[astropy.units.Unit, astropy.units.Unit]) – Unit conversions to apply across all columns in the collection

  • **dataset_conversion (dict) – Unit conversions apply to specific datasets in the collection.

  • dataset_conversions (dict)

Returns:

A new collection with the unit convention applied.

Return type:

StructureCollection

take(n, at='random')

Take some number of structures from the collection. See opencosmo.Dataset.take().

Parameters:
  • n (int) – The number of structures to take from the collection.

  • at (str, optional) – The method to use to take the structures. One of “random”, “first”, or “last”. Default is “random”.

Returns:

A new collection with the structures taken from the original.

Return type:

StructureCollection

take_range(start, end)

Create a new collection from a row range in this collection. We use standard indexing conventions, so the rows included will be start -> end - 1.

Parameters:
  • start (int) – The first row to get.

  • end (int) – The last row to get.

Returns:

table – The table with only the rows from start to end.

Return type:

astropy.table.Table

Raises:

ValueError – If start or end are negative or greater than the length of the dataset or if end is greater than start.

take_rows(rows)

Take the rows of this collection specified by the rows argument. rows should be an array of integers.

Parameters:

rows: np.ndarray[int]

returns:
  • dataset (The dataset with only the specified rows included)

  • Raises

  • ——-

  • ValueError – If any of the indices is less than 0 or greater than the length of the dataset.

Parameters:

rows (np.ndarray | DataIndex)

with_new_columns(dataset, descriptions={}, **new_columns)

Add new column(s) to one of the datasets in this collection. This behaves exactly like oc.Dataset.with_new_columns(), except that you must specify which dataset the columns should refer too.

pe = oc.col("phi") * oc.col("mass")
collection = collection.with_new_columns("dm_particles", pe=pe)

Structure collections can hold other structure collections. For example, a collection of Halos may hold a structure collection that contians the galaxies of those halos. To update datasets within these collections, use dot syntax to specify a path:

pe = oc.col("phi") * oc.col("mass")
collection = collection.with_new_columns("galaxies.star_particles", pe=pe)

You can also pass numpy arrays or astropy quantities:

random_value = np.random.randint(0, 90, size=len(collection))
random_quantity = random_value*u.deg

collection = collection.with_new_columns("halo_properties",
    random_quantity=random_quantity)

See Adding Custom Columns for more examples.

Parameters:
  • dataset (str) – The name of the dataset to add columns to

  • descriptions (str | dict[str, str], optional) – Descriptions for the new columns. These descriptions will be accessible through Dataset.descriptions. If a dictionary, should have keys matching the column names.

  • columns (**) – The new columns

  • new_columns (DerivedColumn)

Returns:

new_collection – This collection with the additional columns added

Return type:

opencosmo.StructureCollection

Raises:

ValueError – If the dataset is not found in this collection

objects(data_types=None, ignore_empty=True)

Iterate over the objects in this collection as pairs of (properties, datasets). For example, a halo collection could yield the halo properties and datasets for each of the associated partcles.

If you don’t need all the datasets, you can specify a list of data types for example:

for halo in collection.objects(data_types=["halo_properties", "gas_particles", "star_particles"]):
    # do work

At each iteration, halo will be a dictionary with halo properties, gas_particles, and star particles. The “halo_properties” entry will itself be a dictionary with the halo’s properties, while “gas_particles” and “star_particles” will be full Datasets.

Parameters:

data_types (Iterable[str] | None)

Return type:

Iterable[dict[str, Any]]

with_datasets(datasets)

Create a new collection out of a subset of the datasets in this collection. It is also possible to do this when you iterate over the collection with StructureCollection.objects, however doing it up front may be more desirable if you don’t plan to use the dropped datasets at any point.

Parameters:

datasets (list[str])

halos(*args, **kwargs)

Alias for “objects” in the case that this StructureCollection contains halos.

galaxies(*args, **kwargs)

Alias for “objects” in the case that this StructureCollection contains galaxies

make_schema(name=None)
Parameters:

name (Optional[str])

Return type:

Schema

class opencosmo.HealpixMap(datasets, nside, nside_lr, ordering, full_sky, z_range, hidden=None, ordered_by=None, region=None)

A HealpixMap contains one or more datasets of map format. Each dataset will typically contain a different type of data over a specified integrated redshift range. The HealpixMap object provides an API identical to the standard Dataset API, however the data that is provided is returned in healpix or healsparse format, which are different than other opencosmo datasets. This also contains some convenience functions for standard operations.

Parameters:
  • datasets (dict[str, Dataset])

  • nside (int)

  • nside_lr (int)

  • ordering (str)

  • full_sky (bool)

  • z_range (tuple[float, float])

  • hidden (Optional[set[str]])

  • ordered_by (Optional[tuple[str, bool]])

  • region (Optional[Region])

property nside

The healpix nside resolution parameter for this map

Returns:

dtype

Return type:

int

property pixels

The healpix pixels that are included in this map

property nside_lr

The low resolution nside resolution parameter used to access this map with healsparse. :returns: dtype :rtype: int

property ordering

The order of pixelization for the map. Either NESTED or RING. Maps are currently always saved in NESTED format.

Returns:

dtype

Return type:

str

property full_sky

Whether the map has full-sky coverage or not (note if not you must ask for the data in healsparse format and not full healpix format) :returns: dtype :rtype: bool

property header: OpenCosmoHeader

The header associated with this dataset.

OpenCosmo headers generally contain information about the original data this dataset was produced from, as well as any analysis that was done along the way.

Returns:

header

Return type:

opencosmo.header.OpenCosmoHeader

property columns: list[str]

The names of the columns in this map.

Returns:

columns

Return type:

list[str]

property descriptions: dict[str, str | None]

Return the descriptions (if any) of the columns in this map as a dictonary. Columns without a description will be included in the dictionary with a value of None

Returns:

descriptions – The column descriptions

Return type:

dict[str, str | None]

property cosmology: Cosmology

The cosmology of the simulation this dataset is drawn from as an astropy.cosmology.Cosmology object.

Returns:

cosmology

Return type:

astropy.cosmology.Cosmology

property region: Region

The region this dataset is contained in. If no spatial queries have been performed, this will be the full sky for lightcone maps.

Returns:

region

Return type:

opencosmo.spatial.Region

property simulation: HaccSimulationParameters

The parameters of the simulation this dataset is drawn from.

Returns:

parameters

Return type:

opencosmo.parameters.hacc.HaccSimulationParameters

property z_range

The redshift range of the data which created this map.

Returns:

z_range

Return type:

tuple[float, float]

get_data(output='healsparse', nside_out=None)

Get the data in this dataset as healsparse map or as healpix maps (nest-ordered numpy array). Note that a dataset does not load data from disk into memory until this function is called. As a result, you should not call this function until you have performed any transformations you plan to on the data.

You can get the data in two formats, “healsparse” (the default) and “healpix”. “healsparse” format will return the data as a healsparse sparse map. “healpix” will return the data as a dictionary of numpy arrays. For map data, due to format requirements, no units will be attached to the data itself, although these will match the units from the data attributes.

Parameters:
  • output (str, default="healsparse") – The format to output the data in

  • nside_out (int | None)

Returns:

data – The data in this dataset.

Return type:

HealsparseMap | Column | dict[str, ndarray] | ndarray

property data

Return the data in the dataset in healsparse format. The value of this attribute is equivalent to the return value of Dataset.get_data("healsparse").

Returns:

data – The data in the dataset.

Return type:

HealsparseMap

with_resolution(nside)

Return a copy of the map with a new nside resolution.

The new resolution must be strictly less than the current resolution.

Return type:

HealpixMap

make_schema()
Return type:

Schema

bound(region, inclusive=False)

Restrict this map to some subregion. Be default this will include all pixels whose centers fall within the subregion. You can additionally include pixels that overalp without there centers being within the specified region by passing inclusive=True

If trying to query in a circular region, consider using cone_search for simplicity.

Parameters:
  • region (opencosmo.spatial.Region) – The region to query.

  • incluive (bool, default = Flase) – Whether to include pixels that overlap but whose centers are not in the region1

  • inclusive (bool)

Returns:

new_map – The map including the pixels within the region.

Return type:

opencosmo.HealpixMap

Raises:

ValueError – If the query region does not overlap with the coverage of this map in

Perform a search for objects within some angular distance of some given point on the sky. This is a convinience function around bound and is exactly equivalent to

region = oc.make_cone(center, radius)
ds = ds.bound(region)
Parameters:
  • center (tuple | SkyCoord) – The center of the region to search. If a tuple and no units are provided assumed to be RA and Dec in degrees.

  • radius (float | astropy.units.Quantity) – The angular radius of the region to query. If no units are provided, assumed to be degrees.

Returns:

new_map – The pixels in these maps that fall within the given region.

Return type:

opencosmo.HealpixMap

evaluate(func, format='numpy', vectorize=False, insert=True, **evaluate_kwargs)

Iterate over the rows in this collection, apply func to each, and collect the result as new columns in the dataset. You may also choose to simply return thevalues instead of inserting them as a column

This function is the equivalent of with_new_columns for cases where the new column is not a simple algebraic combination of existing columns. Unlike with_new_columns, this method will evaluate the results immediately and the resulting columns will not change under unit transformations.

The function should take in arguments with the same name as the columns in this dataset that are needed for the computation, and should return a dictionary of output values. The dataset will automatically select the needed columns to avoid unnecessarily reading data from disk. The new columns will have the same names as the keys of the output dictionary See Evaluating on Datasets for more details.

If vectorize is set to True, the full columns will be pased to the dataset. Otherwise, rows will be passed to the function one at a time.

This function behaves identically to Dataset.evaluate

Parameters:
  • func (Callable) – The function to evaluate on the rows in the dataset.

  • format (str, default = "numpy") – The format of the data that is provided to your function. If “astropy”, will be a dictionary of astropy quantities. If “numpy”, will be a dictionary of numpy arrays.

  • vectorize (bool, default = False) – Whether to provide the values as full columns (True) or one row at a time (False)

  • insert (bool, default = True) – If true, the data will be inserted as a column in this dataset. Otherwise the data will be returned.

Returns:

dataset – The new lightcone dataset with the evaluated column(s)

Return type:

HealpixMap

filter(*masks, **kwargs)

Filter the map based on some criteria. See Querying Based on Column Values for more information.

Parameters:

*masks (Mask) – The masks to apply to dataset, constructed with opencosmo.col()

Returns:

dataset – The new dataset with the masks applied.

Return type:

Dataset

Raises:

ValueError – If the given refers to columns that are not in the dataset, or the would return zero rows.

rows()

Iterate over the pixels in the map, returning their individual values. Rows are returned as a dictionary. For performance, it is recommended to first select the columns you need to work with.

Yields:

row (dict) – A dictionary of values for each row in the dataset with units.

Return type:

Generator[dict[str, float | Quantity], None, None]

select(columns)

Create a new map from a subset of columns in this map.

Parameters:

columns (str or list[str]) – The column or columns to select.

Returns:

dataset – The new dataset with only the selected columns.

Return type:

Dataset

Raises:

ValueError – If any of the given columns are not in the dataset.

drop(columns)

Produce a new dataset by dropping columns from this map.

Parameters:

columns (str or list[str]) – The column or columns to drop.

Returns:

dataset – The new dataset without the dropped columns

Return type:

Dataset

Raises:

ValueError – If any of the given columns are not in the dataset.

take(n, at='random')

Create a new dataset from some number of rows from this map.

Can take the first n rows, the last n rows, or n random rows depending on the value of ‘at’.

Parameters:
  • n (int) – The number of rows to take.

  • at (str) – Where to take the rows from. One of “start”, “end”, or “random”. The default is “random”.

Returns:

dataset – The new dataset with only the selected rows.

Return type:

Dataset

Raises:

ValueError – If n is negative or greater than the number of rows in the dataset, or if ‘at’ is invalid.

take_range(start, end)
Parameters:
  • start (int)

  • end (int)

take_rows(rows)

Take the rows of a map specified by the rows argument. rows should be an array of integers. Note that for healpix maps the rows refers to the pixel indices.

Parameters:

rows (np.ndarray[int]) – The indices of the rows to take.

Returns:

  • dataset (The dataset with only the specified rows included)

  • Raises

  • ——-

  • ValueError – If any of the indices is less than 0 or greater than the length of the map.

with_new_columns(descriptions={}, **columns)

Create a new map with additional columns. These new columns can be derived from columns already in the dataset, or a numpy array. See Adding Custom Columns and Dataset.with_new_columns for examples.

Parameters:
  • descriptions (str | dict[str, str], optional) – A description for the new columns. These descriptions will be accessible through HealpixMap.descriptions. If a dictionary, should have keys matching the column names.

  • columns (**) – The new columns

Returns:

dataset – This dataset with the columns added

Return type:

opencosmo.Dataset

sort_by(column, invert=False)

Sort this map by the values in a given column. By default sorting is in ascending order (least to greatest). Pass invert = True to sort in descending order (greatest to least).

This is not generally particular useful in map queries, but can be used to enforce ordering schemes or find outlier pixels.

Parameters:
  • column (str) – The column in the map dataset to order the collection by.

  • invert (bool, default = False) – If False (the default), ordering will be from least to greatest. Otherwise greatest to least.

Returns:

result – A new Dataset ordered by the given column.

Return type:

Dataset

with_units(convention=None, conversions={}, **columns)

Unit conversion is usually supported for OpenCosmo datasets, however maps tend to be integrated quantities over a range of redshifts which correspond to observed units so applying unit conversions is not generally easy or appropriate.

Parameters:
  • convention (str | None)

  • conversions (dict[Unit, Unit])

  • columns (Unit)

Return type:

Self