{ "cells": [ { "cell_type": "markdown", "id": "48e48d7f-053e-4a1b-8f95-f2c083c949cd", "metadata": {}, "source": [ "# Tutorial 01: Start with pyTENAX" ] }, { "cell_type": "markdown", "id": "24266dc5-298b-47a7-8884-6b391dd78400", "metadata": {}, "source": [ "Import all libraries needed for running basic TENAX" ] }, { "cell_type": "code", "execution_count": null, "id": "9fb356f4-980b-473d-bdc7-1d4b0a3d8d3a", "metadata": {}, "outputs": [], "source": [ "from importlib.resources import files\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "# Import pyTENAX\n", "from pyTENAX import tenax, plotting" ] }, { "cell_type": "markdown", "id": "d075c82d-500c-46a9-ab7c-bdd65283d44b", "metadata": {}, "source": [ "Let's initiate TENAX class with given setup. For more information about TENAX class, please refer to API documention." ] }, { "cell_type": "code", "execution_count": null, "id": "2f471a9b-98b3-450e-b326-60ad6117cd08", "metadata": {}, "outputs": [], "source": [ "# Initiate TENAX class with customized setup\n", "S = tenax.TENAX(\n", " return_period=[\n", " 2,\n", " 5,\n", " 10,\n", " 20,\n", " 50,\n", " 100,\n", " 200,\n", " ],\n", " durations=[10, 60, 180, 360, 720, 1440], #durations are in minutes and they refer to depth of rainfall within given duration\n", " time_resolution=10, # time resolution in minutes\n", " left_censoring=[0, 0.90], # left censoring threshold \n", " alpha=0.05, # dependence of shape on T depends on statistical significance at the alpha-level.\n", " min_rain = 0.1, # minimum rainfall depth threshold\n", ")" ] }, { "cell_type": "markdown", "id": "1641d7d1-9551-4417-b039-c9b1d10a5660", "metadata": {}, "source": [ "Load test data included in the resources directory. \n", "The column names of the precipitation and temperature datasets need to be specified for the TENAX preprocessing stage. The indexes also need to be set to datetime." ] }, { "cell_type": "code", "execution_count": null, "id": "b43a02dd-66fc-4007-ba4d-f16f10247962", "metadata": {}, "outputs": [], "source": [ "# Load precipitation data\n", "# Create input path file for the test file\n", "file_path_input = files('pyTENAX.res').joinpath('prec_data_Aadorf.parquet')\n", "# Load data from csv file\n", "data = pd.read_parquet(file_path_input)\n", "# Convert 'prec_time' column to datetime, if it's not already\n", "data[\"prec_time\"] = pd.to_datetime(data[\"prec_time\"])\n", "# Set 'prec_time' as the index\n", "data.set_index(\"prec_time\", inplace=True)\n", "name_col = \"prec_values\" # name of column containing data to extract\n", "\n", "# Load temperature data\n", "file_path_temperature = files('pyTENAX.res').joinpath('temp_data_Aadorf.parquet')\n", "t_data = pd.read_parquet(file_path_temperature)\n", "# Convert 'temp_time' column to datetime if it's not already in datetime format\n", "t_data[\"temp_time\"] = pd.to_datetime(t_data[\"temp_time\"])\n", "# Set 'temp_time' as the index\n", "t_data.set_index(\"temp_time\", inplace=True)\n", "temp_name_col = \"temp_values\"" ] }, { "cell_type": "markdown", "id": "d716d3c5-4bff-4465-bd59-a97e9d9c242a", "metadata": {}, "source": [ "## Preprocessing " ] }, { "cell_type": "markdown", "id": "bfbe267d-e6ca-4715-9298-8f81eaef069e", "metadata": {}, "source": [ "Remove incomplete years. Default is 10%. \n", "This can be changed in TENAX class initiation by including \"tolerance\" parameter. \n", "tolerance (float, optional): Maximum allowed fraction of missing data in one year. If exceeded, year will be disregarded from samples." ] }, { "cell_type": "code", "execution_count": null, "id": "f4240b96-48dd-4767-ac21-644fd9e82fe3", "metadata": {}, "outputs": [], "source": [ "data = S.remove_incomplete_years(data, name_col)" ] }, { "cell_type": "markdown", "id": "239a3904-bb1a-4c56-883b-955627250798", "metadata": {}, "source": [ "Transfer data from pandas to numpy as some parts of TENAX class doesn't support pandas, also numpy is faster :) " ] }, { "cell_type": "code", "execution_count": null, "id": "8fbcf5b8-1eb9-46d2-b465-55b68afb8167", "metadata": {}, "outputs": [], "source": [ "# get data from pandas to numpy array\n", "df_arr = np.array(data[name_col])\n", "df_dates = np.array(data.index)\n", "df_arr_t_data = np.array(t_data[temp_name_col])\n", "df_dates_t_data = np.array(t_data.index)" ] }, { "cell_type": "markdown", "id": "25e45f6d-86ae-4dfe-8d01-d2c4348d718f", "metadata": {}, "source": [ "Separate rainfall events by dry spell duration of 24h. \n", "If shorter or longer separation time is needed, it can be changed by inlcuding \"storm_separation_time\" in TENAX class initiation. " ] }, { "cell_type": "code", "execution_count": null, "id": "a36a34d8-c18d-40ac-ab20-6efbf9283ad1", "metadata": {}, "outputs": [], "source": [ "# extract indexes of ordinary events\n", "# these are time-wise indexes =>returns list of np arrays with np.timeindex\n", "idx_ordinary = S.get_ordinary_events(data=df_arr, \n", " dates=df_dates, \n", " name_col=name_col,\n", " check_gaps=False)" ] }, { "cell_type": "markdown", "id": "256002dd-f234-4dec-a639-952d6adb472b", "metadata": {}, "source": [ "Now we know the times of rainfall events, we remove events that are too short. \n", "This is done by min_event_duration (int, optional): Minimum event duration [min]. \n", "Defaults to 30 (minutes)." ] }, { "cell_type": "code", "execution_count": null, "id": "e4331edd-08c3-45f9-8a0e-8980702209f3", "metadata": {}, "outputs": [], "source": [ "# get ordinary events by removing too short events\n", "# returns boolean array, dates of OE in TO, FROM format, and count of OE in each years\n", "arr_vals, arr_dates, n_ordinary_per_year = S.remove_short(idx_ordinary)" ] }, { "cell_type": "markdown", "id": "14b11f65-01a4-4f2b-b1e4-2224846d3ef3", "metadata": {}, "source": [ "Finally, we assign the ordinary events values (maximum depth for given durations) " ] }, { "cell_type": "code", "execution_count": null, "id": "a0dfb163-a028-4576-8479-f49ac00d0464", "metadata": {}, "outputs": [], "source": [ "# assign ordinary events values by given durations, values are in depth per duration, NOT in intensity mm/h\n", "dict_ordinary, dict_AMS = S.get_ordinary_events_values(data=df_arr, \n", " dates=df_dates, \n", " arr_dates_oe=arr_dates)" ] }, { "cell_type": "markdown", "id": "ac77eb0a-23d6-448e-81ab-db2ee251249e", "metadata": {}, "source": [ "After we know ordinary events values and time of maximum depth during the storms, \n", "we can associate temperatures to these events by calculating the temperature averaged during the 24 hours preceeding the rainfall event peak. \n", "The duration averaged over can be changed by including \"temp_time_hour\" parameter in TENAX class initiation. " ] }, { "cell_type": "code", "execution_count": null, "id": "8f602b10-d482-45f2-8f2b-4c79e79ffff1", "metadata": {}, "outputs": [], "source": [ "dict_ordinary, _, n_ordinary_per_year = S.associate_vars(dict_ordinary, \n", " df_arr_t_data, \n", " df_dates_t_data)" ] }, { "cell_type": "markdown", "id": "76aeb3b9-24f9-499c-a16e-8b73626557bd", "metadata": {}, "source": [ "The dictionary of ordinary events contains ordinary events and temperature values for all the specified durations. \n", "**Here we focus only on 10 minutes depth and extract such from dictionary.** \n", "We also calculate the left-censoring threshold." ] }, { "cell_type": "code", "execution_count": null, "id": "b34c9782-d90f-4166-bb2f-4d817b38453b", "metadata": {}, "outputs": [], "source": [ "P = dict_ordinary[\"10\"][\"ordinary\"].to_numpy() \n", "T = dict_ordinary[\"10\"][\"T\"].to_numpy() \n", "blocks_id = dict_ordinary[\"10\"][\"year\"].to_numpy() \n", "# Number of threshold\n", "thr = dict_ordinary[\"10\"][\"ordinary\"].quantile(S.left_censoring[1])" ] }, { "cell_type": "markdown", "id": "6c3dd66e-0287-438d-a10a-3fbdb731cd67", "metadata": {}, "source": [ "## TENAX model" ] }, { "cell_type": "markdown", "id": "bdf70a0b-a461-4201-a6df-8378918fd4ac", "metadata": {}, "source": [ "Here we create temperature intervals for following plots." ] }, { "cell_type": "code", "execution_count": null, "id": "84b1fa4f-abd1-4655-8026-f4625d9e8b11", "metadata": {}, "outputs": [], "source": [ "eT = np.arange(\n", " np.min(T)-4, np.max(T) + 4, 1\n", ") # define T values to calculate distributions. +4 to go beyond graph end" ] }, { "cell_type": "markdown", "id": "90f2fc8d-9a8b-4a64-adbc-3ad3b480119c", "metadata": {}, "source": [ "### Magnitude model " ] }, { "cell_type": "code", "execution_count": null, "id": "d0e16303-92e1-4c47-97c8-0fde30170f85", "metadata": {}, "outputs": [], "source": [ "# magnitude model\n", "F_phat, loglik, _, _ = S.magnitude_model(P, T, thr)\n", "\n", "# Plot the magnitude model\n", "qs = [0.85, 0.95, 0.99, 0.999]\n", "plotting.TNX_FIG_magn_model(P, T, F_phat, thr, eT, qs)\n", "plt.ylabel(\"10-minute precipitation (mm)\")\n", "plt.title(\"TENAX Magnitude model\")\n", "plt.legend(loc=\"upper center\", bbox_to_anchor=(0.5, -0.2))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "a3fe70e0-9be8-47ed-bb60-3db977842155", "metadata": {}, "source": [ "### Temperature model" ] }, { "cell_type": "code", "execution_count": null, "id": "994436df-7710-4f73-9fe8-e0675a78d429", "metadata": {}, "outputs": [], "source": [ "# temperature model\n", "g_phat = S.temperature_model(T)\n", "\n", "# Plot the temperature model\n", "hist, pdf_values = plotting.TNX_FIG_temp_model(T, g_phat, S.beta, eT)\n", "plt.title(\"fig 2b\")\n", "plt.legend(loc=\"upper center\", bbox_to_anchor=(0.5, -0.2))\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "2922b3c8-c360-402a-b1c8-d94d028aa84e", "metadata": {}, "source": [ "### Estimate return levels " ] }, { "cell_type": "markdown", "id": "aeee8617-0802-4c5e-ba0e-0bb2d32cd6eb", "metadata": {}, "source": [ "We need to create sampling intervals for the Monte Carlo approximation of F(x)." ] }, { "cell_type": "code", "execution_count": null, "id": "861082c8-ca6b-443d-9390-c981d437e910", "metadata": {}, "outputs": [], "source": [ "# Sampling intervals for the Montecarlo\n", "Ts = np.arange(np.min(T) - S.temp_delta, \n", " np.max(T) + S.temp_delta, \n", " S.temp_res_monte_carlo)\n" ] }, { "cell_type": "markdown", "id": "bbfb305c-2637-4f56-b77b-42ecdd383fd7", "metadata": {}, "source": [ "After that we simply call the model_inversion function to estimate return levels based on our Magnitude and Temperature model. \n", "We also estimate TENAX uncertainty using bootstrapping, by creating 100 samples with mixed years. \n", "**Estimating TENAX uncertainty can take a while as it runs TENAX 100x**" ] }, { "cell_type": "code", "execution_count": null, "id": "4d74c126-a13a-4c5b-8147-70b069a8dcb8", "metadata": {}, "outputs": [], "source": [ "# mean n of ordinary events\n", "n = n_ordinary_per_year.sum() / len(n_ordinary_per_year)\n", "\n", "# estimates return levels using MC samples\n", "RL, _, P_check = S.model_inversion(F_phat, g_phat, n, Ts)\n", "\n", "# TENAX uncertainty\n", "S.n_monte_carlo = 20000\n", "F_phat_unc, g_phat_unc, RL_unc, n_unc, n_err = S.TNX_tenax_bootstrap_uncertainty(\n", " P, T, blocks_id, Ts\n", ")\n", "\n", "AMS = dict_AMS[\"10\"] # yet the annual maxima\n", "plotting.TNX_FIG_valid(AMS, S.return_period, RL=RL, RL_unc=RL_unc )\n", "plt.title(\"Estimated return levels by TENAX model vs Annual Maximas\")\n", "plt.ylabel(\"10-minute precipitation (mm)\")\n", "plt.legend(loc=\"upper center\", bbox_to_anchor=(0.5, -0.2))\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python (Pyodide)", "language": "python", "name": "python" }, "language_info": { "codemirror_mode": { "name": "python", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8" } }, "nbformat": 4, "nbformat_minor": 5 }