{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000001",
   "metadata": {},
   "source": "# Tutorial 04: Simplified Metastatistical Extreme Value formulation (SMEV)\n\nThis tutorial introduces the **SMEV** (Simplified Metastatistical Extreme Value formulation) framework as implemented in `pyTENAX.smev`.  \nSMEV is an extreme-value model that fits a **Weibull distribution** to ordinary precipitation events (not only annual maxima) and extrapolates return levels for multiple durations.\n\n> **Reference:**  \n> Francesco Marra, Davide Zoccatelli, Moshe Armon, Efrat Morin.  \n> *A simplified MEV formulation to model extremes emerging from multiple nonstationary underlying processes.*  \n> Advances in Water Resources, 127, 280–290, 2019.  \n> https://doi.org/10.1016/j.advwatres.2019.04.002\n\n### Key steps covered:\n1. Initialise the SMEV class\n2. Load and preprocess precipitation data\n3. Extract ordinary events (storm separation)\n4. Assign maximum depths per duration\n5. Fit Weibull parameters and compute return levels\n6. Quantify uncertainty via bootstrapping\n7. Visualise results for all durations"
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000002",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000003",
   "metadata": {},
   "outputs": [],
   "source": [
    "from importlib.resources import files\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from pyTENAX import smev"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000004",
   "metadata": {},
   "source": [
    "## 1. Initialise the SMEV class\n",
    "\n",
    "The `SMEV` class is configured with:\n",
    "- `return_period` — list of return periods of interest [years]\n",
    "- `durations` — list of accumulation durations [minutes]\n",
    "- `time_resolution` — temporal resolution of the input data [minutes]\n",
    "- `min_rain` — minimum rainfall threshold; only timesteps at or above this value are considered part of a storm\n",
    "- `storm_separation_time` — dry-spell duration [hours] that separates two independent storms\n",
    "- `min_event_duration` — minimum storm length [minutes]; shorter events are discarded\n",
    "- `left_censoring` — probability range used for Weibull fitting; `[0.9, 1]` means only the top 10% of events are used"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000005",
   "metadata": {},
   "outputs": [],
   "source": [
    "S_SMEV = smev.SMEV(\n",
    "    return_period=[2, 5, 10, 20, 50, 100, 200],\n",
    "    durations=[10, 60, 180, 360, 720, 1440],\n",
    "    time_resolution=10,\n",
    "    min_rain=0.1,\n",
    "    storm_separation_time=24,\n",
    "    min_event_duration=30,\n",
    "    left_censoring=[0.9, 1],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000006",
   "metadata": {},
   "source": [
    "## 2. Load and preprocess precipitation data\n",
    "\n",
    "We use the Aadorf precipitation dataset included in the package resources.  \n",
    "`remove_incomplete_years` removes years with more than 10% missing data (controlled by the `tolerance` parameter). This ensures the ordinary-event sample is not biased by data gaps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000007",
   "metadata": {},
   "outputs": [],
   "source": [
    "file_path_input = files('pyTENAX.res').joinpath('prec_data_Aadorf.parquet')\n",
    "data = pd.read_parquet(file_path_input)\n",
    "data[\"prec_time\"] = pd.to_datetime(data[\"prec_time\"])\n",
    "data.set_index(\"prec_time\", inplace=True)\n",
    "name_col = \"prec_values\"\n",
    "\n",
    "data = S_SMEV.remove_incomplete_years(data, name_col)\n",
    "\n",
    "# Convert to numpy — SMEV functions require numpy input\n",
    "df_arr   = np.array(data[name_col])\n",
    "df_dates = np.array(data.index)\n",
    "\n",
    "print(f\"Data loaded: {len(df_arr)} timesteps, {data.index.year.nunique()} complete years\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000008",
   "metadata": {},
   "source": [
    "## 3. Extract ordinary events\n",
    "\n",
    "### Storm separation\n",
    "`get_ordinary_events` groups consecutive timesteps at or above `min_rain` into independent storm events.  \n",
    "Two events are considered independent if they are separated by at least `storm_separation_time` hours (24 h here).\n",
    "\n",
    "### Removing short events\n",
    "`remove_short` discards events shorter than `min_event_duration` (30 min).  \n",
    "It returns:\n",
    "- `arr_vals` — boolean array of kept events\n",
    "- `arr_dates` — (end, start) timestamps of each kept event\n",
    "- `n_ordinary_per_year` — count of ordinary events per year, used later as the SMEV rate parameter **n**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000009",
   "metadata": {},
   "outputs": [],
   "source": [
    "idx_ordinary = S_SMEV.get_ordinary_events(\n",
    "    data=df_arr, dates=df_dates, check_gaps=False\n",
    ")\n",
    "print(f\"Storms detected: {len(idx_ordinary)}\")\n",
    "\n",
    "arr_vals, arr_dates, n_ordinary_per_year = S_SMEV.remove_short(idx_ordinary)\n",
    "print(f\"Storms kept after duration filter: {len(arr_vals)}\")\n",
    "\n",
    "# n = mean number of ordinary events per year (SMEV rate parameter)\n",
    "n = (n_ordinary_per_year.sum() / len(n_ordinary_per_year)).item()\n",
    "print(f\"Mean ordinary events per year (n): {n:.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000010",
   "metadata": {},
   "source": [
    "## 4. Assign maximum depths per duration\n",
    "\n",
    "`get_ordinary_events_values` computes, for each ordinary event and each duration in `S_SMEV.durations`, the maximum accumulated depth within a sliding window of that length.  \n",
    "\n",
    "The result is two dictionaries keyed by duration string (e.g. `\"10\"`, `\"60\"`, …):\n",
    "- `dict_ordinary` — DataFrame with columns `year`, `oe_time`, `ordinary` (depth in mm)\n",
    "- `dict_AMS` — DataFrame with annual maximum series (AMS) per year\n",
    "\n",
    "The default `method=\"vectorized\"` uses `numpy.convolve`. For large datasets, `method=\"njit_parallel\"` (requires `numba`) is significantly faster — see the docstring for warmup instructions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000011",
   "metadata": {},
   "outputs": [],
   "source": [
    "dict_ordinary, dict_AMS = S_SMEV.get_ordinary_events_values(\n",
    "    data=df_arr, dates=df_dates, arr_dates_oe=arr_dates\n",
    ")\n",
    "\n",
    "# Quick look at the 10-min ordinary events\n",
    "dict_ordinary[\"10\"].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000012",
   "metadata": {},
   "source": "## 5. Fit SMEV and compute return levels\n\nSMEV fits a **Weibull distribution** to the ordinary event depths.  \nLeft censoring (`[0.9, 1]`) means only the top 10% of events enter the OLS regression — this focuses the fit on the upper tail, which is what matters for extreme return levels.\n\n### Statistical framework\n\nSMEV models the ordinary event depths as Weibull-distributed:\n\n$$F(x) = 1 - \\exp\\left[-\\left(\\frac{x}{\\lambda}\\right)^{\\kappa}\\right]$$\n\nwhere $\\lambda > 0$ is the **scale** parameter and $\\kappa > 0$ is the **shape** parameter.\n\nThe annual maximum distribution is derived by raising the ordinary-event CDF to the power of $n$ (mean number of events per year), which follows from the assumption that events in a year are independent:\n\n$$\\text{SMEV}(x) = \\left[F(x)\\right]^n = \\left(1 - \\exp\\left[-\\left(\\frac{x}{\\lambda}\\right)^{\\kappa}\\right]\\right)^n$$\n\nSetting $\\text{SMEV}(x_T) = 1 - 1/T$ and inverting yields the **return level formula** implemented in `smev_return_values`:\n\n$$\\boxed{x_T = \\lambda \\left(-\\ln\\left(1 - \\left(1 - \\frac{1}{T}\\right)^{1/n}\\right)\\right)^{1/\\kappa}}$$\n\nwhere $\\lambda$ is the scale, $\\kappa$ is the shape, $n$ is the mean number of ordinary events per year, and $T$ is the return period [years].\n\n### Running SMEV\n\nYou can run SMEV for a **single duration** directly:\n\n```python\nP = dict_ordinary[\"10\"][\"ordinary\"].to_numpy()\nshape, scale = S_SMEV.estimate_smev_parameters(P, S_SMEV.left_censoring)\nRL = S_SMEV.smev_return_values(S_SMEV.return_period, shape, scale, n)\n```\n\nAlternatively, `_run_smev_all_durations` is a convenience wrapper that iterates over all configured durations at once and returns a summary DataFrame."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000013",
   "metadata": {},
   "outputs": [],
   "source": [
    "df_results = S_SMEV._run_smev_all_durations(dict_ordinary, n)\n",
    "print(df_results.to_string())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000014",
   "metadata": {},
   "source": [
    "## 6. Bootstrap uncertainty\n",
    "\n",
    "`smev_bootstrap_uncertainty` estimates the sampling uncertainty of the return levels by block bootstrapping over years.  \n",
    "In each iteration, years are resampled with replacement, the Weibull is refitted, and return levels are recomputed.  \n",
    "The 5th and 95th percentiles of the resulting distribution form the **90% confidence interval (CI)**.\n",
    "\n",
    "> **Note:** 1000 iterations × 6 durations takes ~30–60 seconds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000015",
   "metadata": {},
   "outputs": [],
   "source": [
    "boot_results = {}\n",
    "for dur in [str(d) for d in S_SMEV.durations]:\n",
    "    P      = dict_ordinary[dur][\"ordinary\"].to_numpy()\n",
    "    blocks = dict_ordinary[dur][\"year\"].to_numpy()\n",
    "    shape, scale = S_SMEV.estimate_smev_parameters(P, S_SMEV.left_censoring)\n",
    "    RL     = S_SMEV.smev_return_values(S_SMEV.return_period, shape, scale, n)\n",
    "    RL_unc = S_SMEV.smev_bootstrap_uncertainty(P, blocks, 1000, n)\n",
    "    boot_results[dur] = {\"shape\": shape, \"scale\": scale, \"RL\": RL, \"RL_unc\": RL_unc}\n",
    "\n",
    "print(\"Bootstrap complete.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1b2c3d4-0001-0001-0001-000000000016",
   "metadata": {},
   "source": [
    "## 7. Visualise: SMEV fit for all durations\n",
    "\n",
    "Each subplot shows:\n",
    "- **Green crosses** — observed annual maxima (AMS) at their empirical return periods\n",
    "- **Red dashed line** — SMEV fitted return level curve\n",
    "- **Red shading** — 90% bootstrap confidence interval"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1b2c3d4-0001-0001-0001-000000000017",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(2, 3, figsize=(15, 8))\n",
    "axes = axes.flatten()\n",
    "\n",
    "for ax, dur in zip(axes, [str(d) for d in S_SMEV.durations]):\n",
    "    AMS    = dict_AMS[dur]\n",
    "    shape  = boot_results[dur][\"shape\"]\n",
    "    scale  = boot_results[dur][\"scale\"]\n",
    "    RL     = boot_results[dur][\"RL\"]\n",
    "    RL_unc = boot_results[dur][\"RL_unc\"]\n",
    "\n",
    "    RL_up  = np.quantile(RL_unc, 0.95, axis=0)\n",
    "    RL_low = np.quantile(RL_unc, 0.05, axis=0)\n",
    "\n",
    "    AMS_sort = AMS.sort_values(by=[\"AMS\"])[\"AMS\"]\n",
    "    plot_pos = np.arange(1, len(AMS_sort) + 1) / (1 + len(AMS_sort))\n",
    "    eRP = 1 / (1 - plot_pos)\n",
    "\n",
    "    ax.fill_between(S_SMEV.return_period, RL_low, RL_up, color=\"r\", alpha=0.2, label=\"90% CI\")\n",
    "    ax.plot(eRP, AMS_sort, \"g+\", label=\"Observed AMS\")\n",
    "    ax.plot(S_SMEV.return_period, RL, \"--r\", linewidth=2, label=\"SMEV\")\n",
    "    ax.set_xscale(\"log\")\n",
    "    ax.set_xlim(1, 200)\n",
    "    ax.set_xticks(S_SMEV.return_period)\n",
    "    ax.set_xticklabels(S_SMEV.return_period)\n",
    "    ax.set_xlabel(\"Return period (years)\")\n",
    "    ax.set_ylabel(\"Depth (mm)\")\n",
    "    ax.set_title(f\"{dur} min  |  shape={shape:.3f}  scale={scale:.3f}\")\n",
    "    ax.legend(fontsize=8)\n",
    "\n",
    "plt.suptitle(\"SMEV fit — Aadorf, all durations\", fontsize=13)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}