christopherlovell/synthesizer_tutorial.ipynb

## synthesizer_tutorial.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/christopherlovell/28900542b249c77699a6eef0ece581e7/synthesizer_tutorial.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Introduction\n",
        "\n"
      ],
      "metadata": {
        "id": "xobvNnuhhdKa"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c3954dbf"
      },
      "source": [
        "This notebook demonstrates how to use the `Synthesizer` code to generate photometry from cosmological galaxy formation simulations. We will explore how to load simulation data, calculate the stellar emission, include the effect of dust attenuation, and utilize simulation-based inference techniques to calibrate the dust model.\n",
        "\n",
        "Throughout we use a simulation from the CAMELS suite as an example, but everything after data loading should be agnostic to the simulation shown."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "113b3aba"
      },
      "source": [
        "## Install synthesizer\n",
        "\n",
        "Installation instructions for synthesizer. Some of these instructions are needed to get this working on colab, please see the [documentation](https://synthesizer-project.github.io/synthesizer/) for the most up to date and simple install intructinos for your machine.\n",
        "\n",
        "Run everything up to *\"Generating Photometry*\" at the start of the session. Do not restart the session when prompted, just press cancel!"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "%env SYNTHESIZER_DIR=/content"
      ],
      "metadata": {
        "id": "xzPGfBPtHCoJ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8a37d396",
        "collapsed": true
      },
      "source": [
        "!WITH_OPENMP=1 pip install cosmos-synthesizer[test]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!synthesizer-init"
      ],
      "metadata": {
        "id": "bQOsNV_MA-jU",
        "collapsed": true
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Downlaod synthesizer test data"
      ],
      "metadata": {
        "id": "JVuTeGid31wM"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!synthesizer-download --test-grids --destination grids/."
      ],
      "metadata": {
        "id": "AmfxD1qyLMH6"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Install LtU-ILI"
      ],
      "metadata": {
        "id": "NSGZWn0m3xL6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!git clone https://github.com/maho3/ltu-ili.git"
      ],
      "metadata": {
        "id": "wgF8Azx-kfdX"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "cd ltu-ili/"
      ],
      "metadata": {
        "id": "IWOreZwTkv9u"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install \".[pytorch]\""
      ],
      "metadata": {
        "collapsed": true,
        "id": "g5MdpKdekxYL"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "cd ../"
      ],
      "metadata": {
        "id": "GY9Faug_4xt3"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install schwimmbad"
      ],
      "metadata": {
        "id": "6yfBxTqIZ2gG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c4c51486"
      },
      "source": [
        "Download CAMELS simulation data.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "5eb500bf",
        "collapsed": true
      },
      "source": [
        "sim = 'IllustrisTNG'  # 'Swift-EAGLE'\n",
        "\n",
        "snapshot_url = f\"https://users.flatironinstitute.org/~camels/Sims/{sim}/L25n256/CV/CV_1/snapshot_082.hdf5\"\n",
        "subfind_url = f\"https://users.flatironinstitute.org/~camels/Sims/{sim}/L25n256/CV/CV_1/groups_082.hdf5\"\n",
        "\n",
        "!wget {snapshot_url}\n",
        "!wget {subfind_url}\n",
        "\n",
        "!ls"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c19c02cc"
      },
      "source": [
        "# Generating photometry\n",
        "\n",
        "We first load the CAMELS simulation data into synthesizer. We do this with a dedicated method for IllustrisTNG\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ea6706b9"
      },
      "source": [
        "from synthesizer.load_data.load_camels import load_CAMELS_IllustrisTNG, load_CAMELS_SwiftEAGLE_subfind\n",
        "from synthesizer.conversions import lnu_to_absolute_mag\n",
        "from synthesizer.emission_models import IncidentEmission\n",
        "from synthesizer.instruments import FilterCollection\n",
        "from synthesizer.grid import Grid\n",
        "\n",
        "# Assuming the files are in the current directory\n",
        "snapshot_path = 'snapshot_082.hdf5'\n",
        "subfind_path = 'groups_082.hdf5'\n",
        "\n",
        "gals = load_CAMELS_IllustrisTNG(\n",
        "# gals = load_CAMELS_SwiftEAGLE_subfind(\n",
        "    './',\n",
        "    snap_name=snapshot_path,\n",
        "    group_name=subfind_path\n",
        ")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt\n",
        "from unyt import Myr\n",
        "\n",
        "mstar = np.array([g.stellar_mass for g in gals])\n",
        "sfr = np.array([g.stars.get_sfr(timescale=100*Myr) for g in gals])\n",
        "\n",
        "sfr[sfr == 0] = 5e-3  #* np.random.rand(np.sum(sfr==0))\n",
        "\n",
        "plt.scatter(np.log10(mstar), np.log10(sfr))\n",
        "plt.xlabel(r\"$\\rm log_{10}(M_{\\star} \\,/\\, M_{\\odot})$\")\n",
        "plt.ylabel(r\"$\\rm log_{10}(SFR \\,/\\, M_{\\odot} \\, yr^{-1})$\")"
      ],
      "metadata": {
        "id": "1CQ5qJ5IP_7-"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now load a grid. [Grids](https://synthesizer-project.github.io/synthesizer/emission_grids/grids.html) are an important concept in Synthesizer, that describe the emission from some source (stellar, AGN, etc.) as a fucntion of some simple axes. In this case, we load a test grid provided with synthesizer, based on the BPASS v2.2.1 binary stellar population synthesis models."
      ],
      "metadata": {
        "id": "774UoZnM8f0_"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from unyt import angstrom\n",
        "\n",
        "grid_name = \"test_grid\"\n",
        "grid = Grid(grid_name)\n",
        "\n",
        "print(grid)"
      ],
      "metadata": {
        "id": "gBH93cUm0Gqd"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "[Emission models](https://synthesizer-project.github.io/synthesizer/emission_models/emission_models.html) are another important concept in Synthesizer, that describe how simulation properties are mapped to emissions. Emission models consist of four main operations: *extraction*, *generation*, *transformation* and *combination*, that can be combined to form arbitrarily complex models. Below we use a very simple premade model, *IncidentEmission*, that just extracts the incident emission directly from a grid."
      ],
      "metadata": {
        "id": "QtxEjMjS8yOl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "incident = IncidentEmission(grid)\n",
        "print(incident)"
      ],
      "metadata": {
        "id": "I1Vroq4E8xln"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now combine these ingredients to generate some spectra. We first filter our galaxies by stellar mass, then call the `get_spectra` method on the Stars component on each galaxy, extracting the rest-frame luminosity stored on the `lnu` attribute."
      ],
      "metadata": {
        "id": "RGqiBV-794Wh"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Filter by stellar mass\n",
        "mstar_mask = np.array([np.sum(g.stars.masses) > 1e9 for g in gals])\n",
        "gals = [g for g, _m in zip(gals, mstar_mask) if _m]\n",
        "len(gals)\n",
        "\n",
        "specs = [g.stars.get_spectra(incident, nthreads=1) for g in gals]"
      ],
      "metadata": {
        "id": "xRELvJhVMNhP"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's visualise some of these spectra, using the `plot_spectra` method on a galaxy object."
      ],
      "metadata": {
        "id": "IHTw4Kp0J-5x"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "fig, ax = specs[0].plot_spectra()\n",
        "for i in np.arange(10):\n",
        "  specs[i].plot_spectra(fig=fig, ax=ax)\n",
        "\n",
        "ax.set_xlim(5e2, 5e4)\n",
        "ax.set_ylim(1e24, 1e31)"
      ],
      "metadata": {
        "id": "o2LvOXn9KAsn"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now use these spectra to generate photometry. We first load some filters, downloaded directly from the SVO [database](https://svo2.cab.inta-csic.es/theory/fps/index.php), then call the `get_photo_lnu` method on each galaxy Stars component. Finally, we convert this photometry from spectral luminosity units to absolute magnitudes. We don't necessarily have to use the spectra returned in `specs`, but this can be useful, as we will demonstrate below."
      ],
      "metadata": {
        "id": "txPV0wmd9bVg"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "filter_codes = [f\"SLOAN/SDSS.{f}\" for f in [\"g\", \"r\"]]\n",
        "fc = FilterCollection(filter_codes=filter_codes, new_lam=grid.lam)"
      ],
      "metadata": {
        "id": "yWRfV5tQ0IPz"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Calculate g-band magnitudes\n",
        "phot = np.vstack(\n",
        "    [g.stars.get_photo_lnu(fc)[\"incident\"][\"SLOAN/SDSS.g\"] for g in gals]\n",
        ")\n",
        "mags = lnu_to_absolute_mag(phot)"
      ],
      "metadata": {
        "id": "eS4G87dfNjtK"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can now plot our g-band luminosity function."
      ],
      "metadata": {
        "id": "7w3I4TJF-hCO"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def calc_df(x, volume, binLimits):\n",
        "    hist, dummy = np.histogram(x, bins=binLimits)\n",
        "    hist = np.float64(hist)\n",
        "    phi = (hist / volume) / (binLimits[1] - binLimits[0])\n",
        "\n",
        "    phi_sigma = (np.sqrt(hist) / volume) / (\n",
        "        binLimits[1] - binLimits[0]\n",
        "    )  # Poisson errors\n",
        "\n",
        "    return phi, phi_sigma, hist\n",
        "\n",
        "# Calculate g-band luminosity function\n",
        "binlims = np.linspace(-25, -16, 12)\n",
        "bins = binlims[:-1] + (binlims[1] - binlims[0]) / 2\n",
        "phi, phi_sigma, _ = calc_df(mags, (25 / h) ** 3, binlims)"
      ],
      "metadata": {
        "id": "c9LfFzK2Mt4L"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "fig, ax = plt.subplots(1, 1)\n",
        "ax.plot(bins, np.log10(phi), label=\"Incident\")\n",
        "ax.legend()\n",
        "ax.set_xlabel(\"$m_{g}^{AB}$\")\n",
        "ax.set_ylabel(r\"$\\phi \\,/\\, \\mathrm{Mpc^{-3} \\; dex^{-1}}$\")\n",
        "ax.set_xlim(-18, -24)"
      ],
      "metadata": {
        "id": "mlL6LsqbMYDD"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Can we speed this up?\n",
        "\n",
        "The main limitation on the speed of the spectra calculation is the resolution of the wavelength grid. We can resample our grid to only include the wavelengths we are interested in, here the rest-frame optical, $3000 < \\lambda \\,/\\, Å < 10000$"
      ],
      "metadata": {
        "id": "VdN0dRjpR_s6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "mask = (grid.lam > 3000 * angstrom) & (grid.lam < 1e4 * angstrom)\n",
        "grid.interp_spectra(new_lam=grid.lam[mask])"
      ],
      "metadata": {
        "id": "AysexO_OW8q3"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "incident = IncidentEmission(grid)\n",
        "specs = [g.stars.get_spectra(incident, nthreads=1) for g in gals]"
      ],
      "metadata": {
        "id": "pEp60MwHYcGt"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# What about dust?\n",
        "\n",
        "If we want to include dust we simply need to update our emission model. Below, we add a transformation emission model component, `StellarEmissionModel`, to apply a power law dust curve with a fixed optical depth. We print this model, and plot the tree; this is still a very simple model, with only an extraction and transformation, so the tree only consists of two leaves connected by a dashed line."
      ],
      "metadata": {
        "id": "O_3Z2MtlOB8G"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from synthesizer.emission_models import StellarEmissionModel\n",
        "from synthesizer.emission_models.attenuation import PowerLaw\n",
        "\n",
        "incident = StellarEmissionModel(\n",
        "    \"incident\",\n",
        "    grid=grid,\n",
        "    extract=\"incident\",\n",
        "    emitter=\"stellar\",\n",
        "    save=False,\n",
        ")\n",
        "\n",
        "attenuated = StellarEmissionModel(\n",
        "    \"attenuated\",\n",
        "    dust_curve=PowerLaw(slope=-1),\n",
        "    apply_to=incident,\n",
        "    tau_v=0.33\n",
        ")\n",
        "print(attenuated)\n",
        "attenuated.plot_emission_tree()"
      ],
      "metadata": {
        "id": "r4hzbaxBi5el"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "To generate new spectra, we just pass this new emission model to the `get_spectra` method."
      ],
      "metadata": {
        "id": "XiMKxUZc_IC9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "specs_attenuated = [g.stars.get_spectra(attenuated, nthreads=4) for g in gals]\n",
        "\n",
        "phot = np.vstack(\n",
        "    [g.stars.get_photo_lnu(fc)[\"attenuated\"][\"SLOAN/SDSS.g\"] for g in gals]\n",
        ")\n",
        "mags = lnu_to_absolute_mag(phot)\n",
        "\n",
        "phi_att, phi_att_sigma, _ = calc_df(mags, (25 / h) ** 3, binlims)"
      ],
      "metadata": {
        "id": "_xl7biFYjX62"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "fig, ax = plt.subplots(1, 1)\n",
        "\n",
        "ax.plot(bins, np.log10(phi), label=\"Intrinsic\")\n",
        "ax.plot(bins, np.log10(phi_att), label=\"Attenuated\")\n",
        "\n",
        "ax.legend()\n",
        "ax.set_xlabel(\"$m_{g}^{AB}$\")\n",
        "ax.set_ylabel(r\"$\\phi \\,/\\, \\mathrm{Mpc^{-3} \\; dex^{-1}}$\")\n",
        "ax.set_xlim(-18, -24)"
      ],
      "metadata": {
        "id": "9dhE8SjOOGHI"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Changing the SPS model\n",
        "\n",
        "We can test how sensitive our results are to various ingredients in our model. We will show this by swapping out the Stellar Population Synthesis (SPS) model.\n",
        "\n",
        "First we need to download a new grid. We have processed a number of these grids, and they are available on **Box** [here](https://sussex.app.box.com/v/SynthesizerProductionGrids/folder/316833654105). You should also be able to run the cell below, which should download a BC03 grid to the current directory."
      ],
      "metadata": {
        "id": "aYRKBmmIWGwW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!wget 'https://euc1.boxcloud.com/d/1/a1!ewPFlp3-Goqt_p80Mz6o2i_wDYVwmQjjSsSeU9JyMAb7akWbJ9p-cQb4fioHuB_CWyU10nBTT9ZxuNzpPmPY3aT4zPUE9Yb_U4X6r_hYei439veZfFycZsNRjE3d4fxEAXecH4pnC_0GSX12d1WQFZF6w79-KkI6sbbTTRJf-Lsl1x1yw-HlW_hptG2iLdiVKeNmZeVljssq_xz_UmO3-vSZDVqmau4eUy9NWu9ojwmarxppOo6pI5nT2y_GPjN_id4nsMc3Qk4Kn-zJzP8fp5fpHT4KxI4xws8ksys-Q9_yn_H0ermvjENEV_yW1qb1BYBFg7Y1_Z1B87wgLtUt9FxvmbMmqJ7VJGf-bSmpmYDEJxrORQhgEf0N8LtJc1pXiccUtgT90OWPnciIhJ9booQmFtqj8F_lulr3MKrUkIEBNZRXg2YgMBBtN7ZjCRSb4wAWb77yfY2IpbpNvvan2whsh-1qAjOzn9WaAVRpOQKFA9sLXf1Cg8Mgkhdj4faQt0k3KCACMb8huZIOiSCRb9IeiWmXVOPmpxOViKwhR-10sN54gdXPeAbsJxbWzkuPiG7YF2LAIQvYnQUqpuWur92NJZR-r9c4NT8sctsZdj8LAb08qO-YDhEf5EHIt5cpcFENq6wArdz7GhPN_t5B4ieFm31NJZIK-c1YVK0v0G7WlDqRNdtiiBHfv4dWJZBbSNDGlvpXGKFDwa323bHB1kyCgb370mEUikvRp2fL1rNX5jBHDu6RC_KyV5xvOrB2xs8sHnnuQkH4pI9BszbKa6x486iBRuA-IZOg0EtT5PEPQ5ODm_Jto2UdcHyj0coh5PEq_1XmTvMblq_9GGzNirQyVlrsPPik6SGj8NsBm1m_1yPzcO6epHhlI-BXhzQYZhSI-hcO2EZokStxBmvHUn_ohMCcRo_cogh2WNUg7MWgzxQqr46aDt7NPGGXj5ccEbVCcc953aVkx4tP0kiaLATrNgGUbL5GnO24T5OXwwVe_6MLY0qZ5zmLx4PoGsOvPOmf0ExzYo_Yc8BdqCrCr4uXW08bQxFBjRg-xT0wakGc7o3DXtykwpGZJMv2Ck_5-EzOKJmn1Tg35uRWjNDhaLS64i-qSto2OG05aoBehPRRpdhDknqCWr81sswQQxxs2ib83Znclxvh35bouQx5_MqjRoTaqUZc5JJuMI4EJ7k5TfO2omJtFx1o-e7Rhc2Cqp1OK_QNSdTxgRSg2LcULkrbos1DKJzhMdw1Fq7SP6WkpSl8xiUPzF9LkshUwrlJgoFQoSctZmyKip6X-qdMTnUrPB-XWdP4zz4plRR8nRt4dQo2XspE3GZ_57ZJear3N72fZpFb4m4dnE7XL2Yle4CYU8sF1YO3brGjJhKgXLclmOOg7dsq2Gr2-DMx7qbXb9XGjHqGdflv7KM4YqbTeczozF9HOUuou0wJ8JXfGRVD0DKUg_irRTG47ZV9uv6x7SEOPFnRq9QxVwD7XwJA2s2qRp-X03a4csUOmAqd5eTOghY./download' --output-document 'grids/bc03-2016-Miles_chabrier-0.1,100_cloudy-c23.01-sps.hdf5'"
      ],
      "metadata": {
        "id": "g_CmbFQZWGUf"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "grid_name = \"bc03-2016-Miles_chabrier-0.1,100_cloudy-c23.01-sps.hdf5\"\n",
        "grid = Grid(grid_name)"
      ],
      "metadata": {
        "id": "lZ746OgYKWHk"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "This is a grid that has been reprocessed through CLOUDY, a photoionisation code that predicts the nebular line and continuum emission. As such, we have a number of additional attributes, spectra and also *lines* that we can explore. A summary of the fiducial naming system for these emissions is provided [here](https://synthesizer-project.github.io/synthesizer/emissions/emissions.html)."
      ],
      "metadata": {
        "id": "DctXFpFI3KFy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "grid.available_emissions"
      ],
      "metadata": {
        "id": "PvatSooV3gM5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's update our emission model to use the *total* emission (including nebular line and continuum emission)."
      ],
      "metadata": {
        "id": "IMZQenjb4VCy"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "intrinsic = StellarEmissionModel(\n",
        "    \"intrinsic\",\n",
        "    grid=grid,\n",
        "    extract=\"total\",\n",
        "    emitter=\"stellar\",\n",
        "    save=False,\n",
        ")\n",
        "\n",
        "attenuated = StellarEmissionModel(\n",
        "    \"attenuated\",\n",
        "    dust_curve=PowerLaw(slope=-1),\n",
        "    apply_to=intrinsic,\n",
        "    tau_v=0.33\n",
        ")"
      ],
      "metadata": {
        "id": "bC9WlpoV4jN8"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "specs_bc03 = [g.stars.get_spectra(attenuated, nthreads=1) for g in gals]"
      ],
      "metadata": {
        "id": "khLhv9-P4wd8"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "fig, ax = specs_bc03[0].plot_spectra()\n",
        "for i in np.arange(1, 10):\n",
        "  specs_bc03[i].plot_spectra(fig=fig, ax=ax)\n",
        "\n",
        "ax.set_xlim(5e2, 5e4)\n",
        "ax.set_ylim(1e25, 1e31)"
      ],
      "metadata": {
        "id": "MTDQM-iv3kS5"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Can clearly see that for some galaxies there are strong emission lines, including Lyman-$\\alpha$.\n",
        "\n",
        "We can also look at the emission from individual lines. These grids have a large number of different lines, following the CLOUDY naming convention, for which luminosities, equivalent widths and other properties are available."
      ],
      "metadata": {
        "id": "gfNhSUjv5SoX"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(grid.available_lines)"
      ],
      "metadata": {
        "id": "q6a_Tnok3dWv"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "blQRvBmP76Az"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "lc = grid.get_lines_for_full_grid('H 1 1215.67A')\n",
        "plt.imshow(\n",
        "    lc.get_flux0()[0],\n",
        "    origin='lower',\n",
        "    aspect='auto',\n",
        "    extent=[\n",
        "        grid.axes_values['metallicities'][0],\n",
        "        grid.axes_values['metallicities'][-1],\n",
        "        np.log10(grid.axes_values['ages'][0]),\n",
        "        np.log10(grid.axes_values['ages'][-1])\n",
        "    ]\n",
        ")\n",
        "plt.xlabel('Metallicity')\n",
        "plt.ylabel(f'log10 Age')\n",
        "plt.colorbar(label=f'Luminosity ({grid.line_lums[\"nebular\"].units})')"
      ],
      "metadata": {
        "id": "gsvIeCqA6Tho"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We also provide functionality for producing common line diagnostic plots, such as the BPT diagram. Here we plot the values from the grid at fixed age, for a range of metallicities."
      ],
      "metadata": {
        "id": "PC_Spip-6aMO"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from synthesizer.emissions import line_ratios\n",
        "\n",
        "diagram_id = \"BPT-NII\"\n",
        "ia = 0  # 1 Myr old for test grid\n",
        "x = []\n",
        "y = []\n",
        "for iZ, Z in enumerate(grid.metallicity):\n",
        "    grid_point = (ia, iZ)\n",
        "    lines = grid.get_lines(grid_point)\n",
        "    x_, y_ = lines.get_diagram(diagram_id)\n",
        "    x.append(x_)\n",
        "    y.append(y_)\n",
        "\n",
        "\n",
        "# Plot the Kewley SF/AGN dividing line\n",
        "fig, ax = plt.subplots()\n",
        "ax.plot(x, y)\n",
        "logNII_Ha = np.arange(-2.0, 1.0, 0.01)\n",
        "logOIII_Hb = line_ratios.plot_bpt_kewley01(\n",
        "    logNII_Ha, fig=fig, ax=ax, show=True, c=\"k\", lw=\"2\", alpha=0.3\n",
        ")"
      ],
      "metadata": {
        "id": "MvMY1u5a6Sw-"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's reduce the wavelength range again, and try reproducing the g-band luminosity function."
      ],
      "metadata": {
        "id": "sw-gaZLf8MAl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "mask = (grid.lam > 3000 * angstrom) & (grid.lam < 1e4 * angstrom)\n",
        "grid.interp_spectra(new_lam=grid.lam[mask])"
      ],
      "metadata": {
        "id": "ata2UefE3JFP"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Update our attenuated emission model with our new grid."
      ],
      "metadata": {
        "id": "5M6SeaTtLDNt"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "attenuated['intrinsic'].set_grid(grid)\n",
        "print(attenuated)"
      ],
      "metadata": {
        "id": "bZhLUq1KKIs_"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "specs_BC03 = [g.stars.get_spectra(attenuated) for g in gals]\n",
        "\n",
        "phot = np.vstack(\n",
        "    [g.stars.get_photo_lnu(fc)[\"attenuated\"][\"SLOAN/SDSS.g\"] for g in gals]\n",
        ")\n",
        "mags = lnu_to_absolute_mag(phot)\n",
        "\n",
        "phi_bc03, _, _ = calc_df(mags, (25 / h) ** 3, binlims)\n",
        "\n",
        "fig, ax = plt.subplots(1, 1)\n",
        "ax.plot(bins, np.log10(phi_bc03), label=\"BC03\")\n",
        "ax.plot(bins, np.log10(phi_att), label=\"BPASS\")\n",
        "ax.legend()\n",
        "ax.set_xlabel(\"$m_{g}^{AB}$\")\n",
        "ax.set_ylabel(r\"$\\phi \\,/\\, \\mathrm{Mpc^{-3} \\; dex^{-1}}$\")\n",
        "ax.set_xlim(-18, -24)"
      ],
      "metadata": {
        "id": "OM9HV2GtLQBC"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Calibrating the dust model\n",
        "\n",
        "Can we fit some of these parameters in our dust model, such as the optical depth and slope of the attenuation law? Here we present a somewhat over engineered example, using Simulation Based Inference (SBI; *waves hands*) to infer posteriors on these parameters, derived from our forward modelled data. SBI has a number of neat advantages over traditional inference approaches, utilising neural density estimators to directly model the relationship between data and parameters.\n",
        "\n",
        "We will use the [LtU-ILI package](https://github.com/maho3/ltu-ili/tree/main) to build a Neural Posterior Estimation (NPE) model."
      ],
      "metadata": {
        "id": "725BSpnDkLwt"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import torch\n",
        "\n",
        "import ili\n",
        "from ili.dataloaders import NumpyLoader\n",
        "from ili.inference import InferenceRunner\n",
        "from ili.validation.metrics import PosteriorCoverage, PlotSinglePosterior\n",
        "\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "print('Device:', device)"
      ],
      "metadata": {
        "id": "aAqHnkPinj6H"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We create a slightly more complex emission model for this example, utilising a two component screen inspired by Charlot & Fall 2000. We start by precomputing the intrinsic spectra for young and old stars, where the division is made at 100 Myr."
      ],
      "metadata": {
        "id": "zDP53xdX96T6"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from synthesizer.emission_models import EmissionModel\n",
        "from unyt import Myr\n",
        "\n",
        "intrinsic_old = StellarEmissionModel(\n",
        "    \"intrinsic_old\",\n",
        "    grid=grid,\n",
        "    extract=\"total\",\n",
        "    emitter=\"stellar\",\n",
        "    mask_attr=\"ages\",\n",
        "    mask_thresh=10 * Myr,\n",
        "    mask_op='>',\n",
        "    save=True,\n",
        ")\n",
        "\n",
        "intrinsic_young = StellarEmissionModel(\n",
        "    \"intrinsic_young\",\n",
        "    grid=grid,\n",
        "    extract=\"total\",\n",
        "    emitter=\"stellar\",\n",
        "    mask_attr=\"ages\",\n",
        "    mask_thresh=10 * Myr,\n",
        "    mask_op='<=',\n",
        "    save=True,\n",
        ")\n",
        "\n",
        "intrinsic = EmissionModel(\n",
        "    label=\"intrinsic\",\n",
        "    combine=(intrinsic_young, intrinsic_old),\n",
        "    emitter=\"stellar\",\n",
        ")\n",
        "\n",
        "specs = [g.stars.get_spectra(intrinsic, nthreads=1) for g in gals]"
      ],
      "metadata": {
        "id": "frCxn8qpCdOy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We then create a latin hypercube of four parameters, the ISM optical depth, the birth cloud optical depth, and the slope of the attenuation curve for the ISM and birth cloud dust."
      ],
      "metadata": {
        "id": "EjPHSojy-kO8"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from scipy.stats import qmc\n",
        "from tqdm import tqdm\n",
        "\n",
        "sampler = qmc.LatinHypercube(d=4)\n",
        "sample = sampler.random(n=100)\n",
        "\n",
        "l_bounds = [0.1, 0.5, -2, -2]\n",
        "u_bounds = [0.5, 1.0, -0.5, -0.5]\n",
        "theta = qmc.scale(sample, l_bounds, u_bounds)"
      ],
      "metadata": {
        "id": "E_XYnMtjK60y"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We now loop over these parameters, and calculate the photometry for each combinatino of parameters."
      ],
      "metadata": {
        "id": "DilBEG5D-2ev"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "phi_att = [None] * len(theta)\n",
        "for i, _params in tqdm(enumerate(theta), total=len(theta)):\n",
        "  for g in gals:\n",
        "    int_young = g.stars.spectra['intrinsic_young']\n",
        "    int_old = g.stars.spectra['intrinsic_old']\n",
        "\n",
        "    att_young = int_young.apply_attenuation(tau_v=_params[0], dust_curve=PowerLaw(slope=_params[2]))\n",
        "    att_old = int_old.apply_attenuation(tau_v=_params[1], dust_curve=PowerLaw(slope=_params[3]))\n",
        "    g.stars.spectra['attenuated'] = att_young + att_old\n",
        "\n",
        "  phot = np.vstack(\n",
        "      [g.stars.get_photo_lnu(fc)[\"attenuated\"][\"SLOAN/SDSS.g\"] for g in gals]\n",
        "  )\n",
        "  mags = lnu_to_absolute_mag(phot)\n",
        "  phi_att[i], phi_att_sigma, _ = calc_df(mags, (25 / h) ** 3, binlims)"
      ],
      "metadata": {
        "collapsed": true,
        "id": "yCVAC9eqARkW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "The figure below shows some of our example luminosity functions"
      ],
      "metadata": {
        "id": "1EHnR-OU_ugZ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "fig, ax = plt.subplots(1, 1)\n",
        "\n",
        "ax.plot(bins, np.log10(phi_bc03), label=\"Fiducial\")\n",
        "ax.fill_between(bins, np.log10(np.percentile(phi_att, 10, axis=0)), np.log10(np.percentile(phi_att, 90, axis=0)), color='grey', alpha=0.6)\n",
        "\n",
        "ax.legend()\n",
        "ax.set_xlabel(\"$m_{g}^{AB}$\")\n",
        "ax.set_ylabel(r\"$\\phi \\,/\\, \\mathrm{Mpc^{-3} \\; dex^{-1}}$\")\n",
        "ax.set_xlim(-18, -24)\n",
        "\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "Qi0U8uLlpIbO"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Simulation Based Inference\n",
        "\n",
        "We load our data x (binned luminosity function) and parameters theta, set our priors to be flat over the same range as the proposal distribution, and combine two Masked Autoregressive Flows (MAF) with a Mixture Density Network (MDN) to model the posterior. We set some hyperparameters for the training, such as batch size and learning rate. In a real science case we would optimise these parameters."
      ],
      "metadata": {
        "id": "7wpLjMd5M7Xf"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# make a dataloader\n",
        "loader = NumpyLoader(x=phi_att, theta=theta)\n",
        "\n",
        "# define a prior\n",
        "prior = ili.utils.Uniform(low=l_bounds, high=u_bounds, device=device)\n",
        "\n",
        "# instantiate your neural networks to be used as an ensemble\n",
        "repeats_maf = 2 # not to have to duplicate code for a large ensemble of identical architectures\n",
        "nets = [\n",
        "    ili.utils.load_nde_sbi(engine='NPE', model='maf', hidden_features=50, num_transforms=5, repeats = repeats_maf),\n",
        "    ili.utils.load_nde_sbi(engine='NPE', model='mdn', hidden_features=50, num_components=6)\n",
        "]\n",
        "\n",
        "# define training arguments\n",
        "train_args = {\n",
        "    'training_batch_size': 32,\n",
        "    'learning_rate': 1e-4\n",
        "}\n",
        "\n",
        "# initialize the trainer\n",
        "runner = InferenceRunner.load(\n",
        "    backend='sbi',\n",
        "    engine='NPE',\n",
        "    prior=prior,\n",
        "    nets=nets,\n",
        "    device=device,\n",
        "    train_args=train_args,\n",
        "    proposal=None,\n",
        "    out_dir=None\n",
        ")\n"
      ],
      "metadata": {
        "id": "0RTxBjJKM-X7"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# train the model\n",
        "posterior_ensemble, summaries = runner(loader=loader)"
      ],
      "metadata": {
        "id": "DFBFuvVGM48v"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can visualise our training curves to make sure we're not overfitting."
      ],
      "metadata": {
        "id": "_lMEpPWGIEpP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.colors as mcolors\n",
        "\n",
        "# plot train/validation loss\n",
        "fig, ax = plt.subplots(1, 1, figsize=(6,4))\n",
        "c = list(mcolors.TABLEAU_COLORS)\n",
        "for i, m in enumerate(summaries):\n",
        "    ax.plot(-1.0*np.array(m['training_loss']), ls='-', label=f\"{i}_train\", c=c[i])\n",
        "    ax.plot(-1.0*np.array(m['validation_loss']), ls='--', label=f\"{i}_val\", c=c[i])\n",
        "ax.set_xlim(0)\n",
        "ax.set_xlabel('Epoch')\n",
        "ax.set_ylabel('Log probability')\n",
        "ax.legend()"
      ],
      "metadata": {
        "id": "GZfsHvt_OKEm"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can now visualise our results predicting the posterior for a given simulation. We choose a random index (0) and generate our posterior, and compare to the true value."
      ],
      "metadata": {
        "id": "q2ZOrq9kIJ7C"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "ind = 0\n",
        "\n",
        "# use ltu-ili's built-in validation metrics to plot the posterior for this point\n",
        "metric = PlotSinglePosterior(\n",
        "    num_samples=1000, sample_method='direct',\n",
        "    labels=[f'$\\\\theta_{i}$' for i in range(4)]\n",
        ")\n",
        "fig = metric(\n",
        "    posterior=posterior_ensemble,\n",
        "    x_obs = phi_att[ind], theta_fid=theta[ind],\n",
        "    plot_kws=dict(fill=True)\n",
        ")"
      ],
      "metadata": {
        "id": "SSRb1EuUOM9j"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "We can also generate some coverage and posterior predictive checks. These show how well our NPE is learning an unbiased and accurate representation of the posterior."
      ],
      "metadata": {
        "id": "ynhWYCV8Ieru"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Drawing samples from the ensemble posterior\n",
        "\n",
        "metric = PosteriorCoverage(\n",
        "    num_samples=1000, sample_method='direct',\n",
        "    labels=[f'$\\\\theta_{i}$' for i in range(4)],\n",
        "    plot_list = [\"coverage\", \"histogram\", \"predictions\", \"tarp\", \"logprob\"],\n",
        "    out_dir=None\n",
        ")\n",
        "\n",
        "fig = metric(\n",
        "    posterior=posterior_ensemble, # NeuralPosteriorEnsemble instance from sbi package\n",
        "    x=phi_att, theta=theta\n",
        ")"
      ],
      "metadata": {
        "id": "8UUHfrqaOlOc"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's sample from the posterior and show the range of predicted luminosity functions against the 'true' data."
      ],
      "metadata": {
        "id": "PN-JOQ9nLB2u"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "samples = posterior_ensemble.sample((100,), torch.Tensor(phi_att[ind]).to(device))\n",
        "\n",
        "phi_samp = [None] * len(theta)\n",
        "for i, _params in tqdm(enumerate(samples), total=len(samples)):\n",
        "  for g in gals:\n",
        "    int_young = g.stars.spectra['incident_young']\n",
        "    int_old = g.stars.spectra['incident_old']\n",
        "\n",
        "    att_young = int_young.apply_attenuation(tau_v=float(_params[0]), dust_curve=PowerLaw(slope=float(_params[2])))\n",
        "    att_old = int_old.apply_attenuation(tau_v=float(_params[1]), dust_curve=PowerLaw(slope=float(_params[3])))\n",
        "    g.stars.spectra['attenuated'] = att_young + att_old\n",
        "\n",
        "  phot = np.vstack(\n",
        "      [g.stars.get_photo_lnu(fc)[\"attenuated\"][\"SLOAN/SDSS.g\"] for g in gals]\n",
        "  )\n",
        "  mags = lnu_to_absolute_mag(phot)\n",
        "  phi_samp[i], phi_att_sigma, _ = calc_df(mags, (25 / h) ** 3, binlims)"
      ],
      "metadata": {
        "collapsed": true,
        "id": "cqjXaWJHQqwT"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "fig, ax = plt.subplots(1, 1)\n",
        "\n",
        "ax.plot(bins, np.log10(phi_att[ind]), label=\"True\")\n",
        "\n",
        "for i in np.arange(len(phi_samp)):\n",
        "  ax.plot(bins, np.log10(phi_samp[i]), ls='dashed', color='grey')\n",
        "\n",
        "ax.legend()\n",
        "ax.set_xlabel(\"$m_{g}^{AB}$\")\n",
        "ax.set_ylabel(r\"$\\phi \\,/\\, \\mathrm{Mpc^{-3} \\; dex^{-1}}$\")\n",
        "ax.set_xlim(-18, -24)\n",
        "\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "OCDPBmB4Tw7i"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Can we improve the constraints on the slope by adding in colour information? Try adding in the $g-r$ colour to the data vector, and see how the posterior constraints improve."
      ],
      "metadata": {
        "id": "hFTyHcDCLOXO"
      }
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "zJ_DEP0dLVng"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}
No results found