{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quick Start\n", "\n", "This quick start tutorial will demonstrate the basic usage of dnamite. For more detailed usage see the user guides." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np \n", "import pandas as pd\n", "import seaborn as sns\n", "sns.set_theme()\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why Use dnamite?\n", "\n", "Given a set of $p$ features $X$, dnamite trains additive models with the form $f(X) = \\sum_j f_j (X_j)$. \n", "Such additive models maintain similar structure to linear models but allow each feature function (also known as *shape function*) $f_j$ to be nonlinear thus improving predictive accuracy.\n", "By maintaining additive structure, a trained dnamite model can directly describe its predictions via shape functions, and can summarize the importance of each feature via feature importance scores. \n", "Therefore, dnamite is suitable when both accuracy and interpretability are important.\n", "For more details see the [Why dnamite User Guide](https://dnamite.readthedocs.io/en/latest/notebooks/user_guides/why_dnamite.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Regression\n", "\n", "We'll start by importing some packages and reading in the California Housing dataset, a standard regression dataset. The task is to predict the median house value for a given district in California.\n", "No data preprocessing is requires as dnamite can handle missing values and categorical features natively." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import fetch_california_housing\n", "data = fetch_california_housing(as_frame=True)\n", "X, y = data[\"data\"], data[\"target\"]\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To fit a dnamite model, we use `DNAMiteRegressor`. Binary classification is very similar but with `DNAMiteBinaryClassifier`. The only required input parameter is `n_features`, which should be set to the number of features in our training dataset. We pass two additional optional parameters: 1) `device`, which allows for GPU training if available, and 2) `num_pairs`, which asks the model to include a set number of pairwise interaction." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
DNAMiteRegressor(random_state=672)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DNAMiteRegressor(random_state=672)
DNAMiteRegressor(random_state=344)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DNAMiteRegressor(random_state=344)