Skip to content

Binary Classification Toolkit

Artifact-ML Logo

The artifact-core binary classification toolkit offers a comprehensive suite of artifacts for the evaluation of binary classification experiments.

Usage Sketch

from typing import Dict

import pandas as pd

from artifact_core.binary_classification import (
    BinaryClassificationEngine,
    BinaryClassificationPlotType,
    BinaryFeatureSpec
)

true: Dict[str, str] = df_classification_results["true"].to_dict()

predicted: Dict[str, str] = df_classification_results["predicted"].to_dict()

probs_pos: Dict[str, float] = df_classification_results["predicted_prob"].to_dict()

class_spec = BinaryFeatureSpec(
    ls_categories=["0", "1"],
    positive_category="1",
    feature_name="class"
)

engine = BinaryClassificationEngine(resource_spec=class_spec)

score_pdf_plot = engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.SCORE_PDF,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

score_pdf_plot

Score Distribution Artifact

roc_plot = engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.ROC_CURVE,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

roc_plot

ROC Artifact

Supported Artifacts

Terminology Note

We refer to the model’s predicted positive probabilities — the continuous output values representing its confidence before thresholding — as scores.

These are not to be confused with metric scores, which are scalar evaluation metrics such as accuracy, precision, recall, or ROC_AUC.

Further, we use the term predicted ground-truth probabilities to refer to the probabilities the model assigns to the true (ground-truth) class for each sample.

Scores

  • ACCURACY — Overall classification accuracy.
  • BALANCED_ACCURACY — Mean of recall (TPR) and specificity (TNR).
  • PRECISION — Positive predictive value (PPV).
  • NPV — Negative predictive value.
  • RECALL — True positive rate (TPR, sensitivity).
  • TNR — True negative rate (specificity).
  • FPR — False positive rate.
  • FNR — False negative rate.
  • F1 — Harmonic mean of precision and recall.
  • MCC — Matthews correlation coefficient.
  • ROC_AUC — Area under the ROC curve.
  • PR_AUC — Area under the Precision–Recall curve.
  • GROUND_TRUTH_PROB_MEAN — Mean of predicted ground-truth probabilities.
  • GROUND_TRUTH_PROB_STD — Standard deviation of predicted ground-truth probabilities.
  • GROUND_TRUTH_PROB_VARIANCE — Variance of predicted ground-truth probabilities.
  • GROUND_TRUTH_PROB_MEDIAN — Median of predicted ground-truth probabilities.
  • GROUND_TRUTH_PROB_FIRST_QUARTILE — 25th percentile of predicted ground-truth probabilities.
  • GROUND_TRUTH_PROB_THIRD_QUARTILE — 75th percentile of predicted ground-truth probabilities.
  • GROUND_TRUTH_PROB_MIN — Minimum predicted ground-truth probability.
  • GROUND_TRUTH_PROB_MAX — Maximum predicted ground-truth probability.

Arrays

  • CONFUSION_MATRIX — Single confusion matrix array of TP, TN, FP, and FN counts.

Plots

  • CONFUSION_MATRIX_PLOT — Visual confusion matrix.
  • ROC_CURVE — Receiver Operating Characteristic curve.
  • PR_CURVE — Precision–Recall curve.
  • DET_CURVE — Detection Error Tradeoff curve.
  • RECALL_THRESHOLD_CURVE — Recall as a function of decision threshold.
  • PRECISION_THRESHOLD_CURVE — Precision as a function of decision threshold.
  • SCORE_PDF — Probability density function (PDF) of scores (predicted positive probabilities).
  • GROUND_TRUTH_PROB_PDF — PDF of predicted ground-truth probabilities.

Score Collections

  • NORMALIZED_CONFUSION_COUNTS — normalized TP/TN/FP/FN counts.
  • BINARY_PREDICTION_SCORES — Batched scalar metrics (e.g., precision, recall).
  • THRESHOLD_VARIATION_SCORES — Metrics evaluated across decision threshold sweeps (e.g., PR AUC, ROC AUC etc.).
  • SCORE_STATS — Score (predicted positive probability) summary statistics.
  • POSITIVE_CLASS_SCORE_STATS — Score (predicted positive probability) summary statistics restricted to the positive class.
  • NEGATIVE_CLASS_SCORE_STATS — Score (predicted positive probability) summary statistics restricted to the negative class.
  • SCORE_MEANS — Mean predicted score (positive probability) by split (all, positive, negative).
  • SCORE_STDS — Standard deviation of scores (predicted positive probabilities) by split (all, positive, negative).
  • SCORE_VARIANCES — Variance of scores (predicted positive probabilities) by split (all, positive, negative).
  • SCORE_MEDIANS — Median score (predicted positive probability) by split (all, positive, negative).
  • SCORE_FIRST_QUARTILES — 25th percentile of scores (predicted positive probabilities) by split (all, positive, negative).
  • SCORE_THIRD_QUARTILES — 75th percentile of scores (predicted positive probabilities) by split (all, positive, negative).
  • SCORE_MINIMA — Minimum score (predicted positive probability) by split (all, positive, negative).
  • SCORE_MAXIMA — Maximum score (predicted positive probability) by split (all, positive, negative).
  • GROUND_TRUTH_PROB_STATS — Summary statistics for predicted ground-truth probabilities.

Array Collections

  • CONFUSION_MATRICES — Collection of confusion matrices across normalization modes (none, true, predicted, all).

Plot Collections

  • CONFUSION_MATRIX_PLOTS — Set of confusion matrix visuals across normalization modes (none, true, predicted, all).
  • THRESHOLD_VARIATION_CURVES — Metric-vs-threshold curve set (e.g., ROC, PR, DET).
  • SCORE_PDF_PLOTS — Score (predicted positive probability) density plots across splits (all, positive, negative).