# Methodology¶

## Meta-Learner Algorithms¶

A meta-algorithm (or meta-learner) is a framework to estimate the Conditional Average Treatment Effect (CATE) using any machine learning estimators (called base learners) [kunzel2019metalearners].

A meta-algorithm uses either a single base learner while having the treatment indicator as a feature (e.g. S-learner), or multiple base learners separately for each of the treatment and control groups (e.g. T-learner, X-learner and R-learner).

Confidence intervals of average treatment effect estimates are calculated based on the lower bound formular (7) from [imbens2009recent].

### S-Learner¶

S-learner estimates the treatment effect using a single machine learning model as follows:

**Stage 1**

**Stage 2**

### T-Learner¶

T-learner [kunzel2019metalearners] consists of two stages as follows:

**Stage 1**

**Stage 2**

### X-Learner¶

X-learner [kunzel2019metalearners] is an extension of T-learner, and consists of three stages as follows:

**Stage 1**

**Stage 2**

**Stage 3**

### R-Learner¶

R-learner [nie2017quasi] uses the cross-validation out-of-fold estimates of outcomes \(\hat{m}^{(-i)}(x_i)\) and propensity scores \(\hat{e}^{(-i)}(x_i)\). It consists of two stages as follows:

**Stage 1**

**Stage 2**

## Tree-Based Algorithms¶

### Uplift Tree¶

The Uplift Tree approach consists of a set of methods that use a tree-based algorithm where the splitting criterion is based on differences in uplift. [Rzepakowski2012-br] proposed three different ways to quantify the gain in divergence as the result of splitting [Gutierrez2016-co]:

\(D_{gain} = D_{after_split} (P^T, P^C) - D_{before_split}(P^T, P^C)\)

where \(D\) measures the divergence and \(P^T\) and \(P^C\) refer to the probability distribution of the outcome of interest in the treatment and control groups, respectively. Three different ways to quantify the divergence, KL, ED and Chi, are implemented in the package.

### KL¶

The Kullback-Leibler (KL) divergence is given by:

\(KL(P : Q) = \sum_{k=left, right}p_klog\frac{p_k}{q_k}\)

where \(p\) is the sample mean in the treatmet group, \(q\) is the sample mean in the control group and \(k\) indicates the leaf in which \(p\) and \(q\) are computed [Gutierrez2016-co]

### ED¶

The Euclidean Distance is given by:

\(ED(P : Q) = \sum_{k=left, right}(p_k - q_k)^2\)

where the notation is the same as above.

### Chi¶

Finally, the \(\chi^2\)-divergence is given by:

\(\chi^2(P : Q) = \sum_{k=left, right}\frac{(p_k - q_k)^2}{q_k}\)

where the notation is again the same as above.

### CTS¶

The final Uplift Tree algorithm that is implemented is the Contextual Treatment Selection (CTS) approach by [Zhao2017-kg], where the sample splitting criterion is defined as follows:

\(\hat{\Delta}_{\mu}(s) = \hat{p}(\phi_l \mid \phi) \times \max_{t=0, ..., K}\hat{y}_t(\phi_l) + \hat{p}(\phi_r \mid \phi) \times \max_{t=0, ..., K}\hat{y}_t(\phi_r) - \max_{t=0, ..., K}\hat{y}_t(\phi)\)

where \(\phi_l\) and \(\phi_r\) refer to the feature subspaces in the left leaf and the right leaves respectively, \(\hat{p}(\phi_j \mid \phi)\) denotes the estimated conditional probability of a subject’s being in \(\phi_j\) given \(\phi\), and \(\hat{y}_t(\phi_j)\) is the conditional expected response under treatment \(t\).