Source code for causalml.optimize.pns

[docs]def get_pns_bounds(data_exp, data_obs, T, Y, type="PNS"):
    """
    Args
    ----
    data_exp : DataFrame
        Data from an experiment.
    data_obs : DataFrame
        Data from an observational study
    T : str
        Name of the binary treatment indicator
    y : str
        Name of the binary outcome indicator
    type : str
        Type of probability of causation desired. Acceptable args are:
            - ``PNS``: Probability of necessary and sufficient causation
            - ``PS``: Probability of sufficient causation
            - ``PN``: Probability of necessary causation

    Notes
    -----
    Based on Equation (24) in `Tian and Pearl (2000) <https://ftp.cs.ucla.edu/pub/stat_ser/r271-A.pdf>`_.

    To capture the counterfactual notation, we use ``1`` and ``0`` to indicate the actual and
    counterfactual values of a variable, respectively, and we use ``do`` to indicate the effect
    of an intervention.

    The experimental and observational data are either assumed to come to the same population,
    or from random samples of the population. If the data are from a sample, the bounds may
    be incorrectly calculated because the relevant quantities in the Tian-Pearl equations are
    defined e.g. as :math:`P(Y|do(T))`, not :math:`P(Y|do(T), S)` where :math:`S` corresponds to sample selection.
    `Bareinboim and Pearl (2016) <https://www.pnas.org/doi/10.1073/pnas.1510507113>`_ discuss conditions
    under which :math:`P(Y|do(T))` can be recovered from :math:`P(Y|do(T), S)`.
    """

    # Probabilities calculated from observational data
    Y1 = data_obs[Y].mean()
    T1Y0 = (
        data_obs.loc[(data_obs[T] == 1) & (data_obs[Y] == 0)].shape[0]
        / data_obs.shape[0]
    )
    T1Y1 = (
        data_obs.loc[(data_obs[T] == 1) & (data_obs[Y] == 1)].shape[0]
        / data_obs.shape[0]
    )
    T0Y0 = (
        data_obs.loc[(data_obs[T] == 0) & (data_obs[Y] == 0)].shape[0]
        / data_obs.shape[0]
    )
    T0Y1 = (
        data_obs.loc[(data_obs[T] == 0) & (data_obs[Y] == 1)].shape[0]
        / data_obs.shape[0]
    )

    # Probabilities calculated from experimental data
    Y1doT1 = data_exp.loc[data_exp[T] == 1, Y].mean()
    Y1doT0 = data_exp.loc[data_exp[T] == 0, Y].mean()
    Y0doT0 = 1 - Y1doT0

    if type == "PNS":
        lb_args = [0, Y1doT1 - Y1doT0, Y1 - Y1doT0, Y1doT1 - Y1]

        ub_args = [Y1doT1, Y0doT0, T1Y1 + T0Y0, Y1doT1 - Y1doT0 + T1Y0 + T0Y1]

    if type == "PN":
        lb_args = [0, (Y1 - Y1doT0) / T1Y1]
        ub_args = [1, (Y0doT0 - T0Y0) / T1Y1]

    if type == "PS":
        lb_args = [0, (Y1doT1 - Y1) / T0Y0]
        ub_args = [1, (Y1doT1 - T1Y1) / T0Y0]

    lower_bound = max(lb_args)
    upper_bound = min(ub_args)

    return lower_bound, upper_bound