Fairness Metrics and Analysis

This repository contains scripts and data for generating synthetic datasets, calculating fairness metrics, and analyzing temporal trends. It also includes processing scripts for real-world data from the UNSW dataset.

Directory Structure

Data Generators
- biased_data_generator.py: Generates a synthetic dataset with intentional biases (full bias: alert, confirmation, and score perturbations).
- balanced_data_generator.py: Generates a synthetic dataset with balanced, unbiased data.
- alert_only_bias_generator.py: (NEW) Generates a synthetic dataset with alert-generation bias only (platform-dependent alert rates), while confirmation rates and score assignments remain uniform. Used for metric isolation analysis (targets Ï†_ind).
- calibration_only_bias_generator.py: (NEW) Generates a synthetic dataset with calibration bias only (platform-dependent confirmation rates and score drift), while alert rates remain uniform. Used for metric isolation analysis (targets Î´_cal).
Generated Data
- synthetic_fairness_data_biased.csv: Synthetic data with biases in platform visibility, alert generation, and scoring (full bias scenario).
- synthetic_fairness_data_neutral.csv: Synthetic data with balanced, unbiased characteristics (neutral baseline).
- synthetic_fairness_data_alert_only.csv: (NEW) Synthetic data with alert-generation bias only.
- synthetic_fairness_data_calibration_only.csv: (NEW) Synthetic data with calibration bias only.
Metrics Calculation and Analysis
- metrics_calculator.py: Calculates and compares fairness metrics for biased and neutral datasets (2-scenario comparison).
- metrics_calculator_extended.py: (NEW) Calculates fairness metrics across all four scenarios (neutral, alert-only, calibration-only, full bias), generates a comparison table and per-metric visualizations for metric isolation analysis.
- temporal_analysis.py: Analyzes the temporal evolution of fairness metrics.
Real-World Data Processing
- realworld.py: Processes the UNSW dataset, applies robust preprocessing, and computes fairness metrics using bootstrapping.
- getN.py: (NEW) Reports the total number of records before and after preprocessing ($N_{\text{raw}}$ and $N_{\text{filtered}}$), enabling independent verification of the dataset size reported in the paper.
Statistical Significance Testing
- test_hip.py: Performs statistical significance testing on the fairness metrics using the Mann-Whitney U test, including rank-biserial correlation as effect size (UPDATED) to complement p-values under large sample sizes. Outputs results to hypothesis_test_results.csv.
Classical Fairness Metrics
- classic_metrics.py: Computes classical fairness metrics (SPD and EOD) using TCP as the reference group and outputs results in CSV and LaTeX formats.
Empirical compatibility of new operational fairness metrics
- compatibility.py: Analyzes the joint behavior of fairness metrics across platforms, flags significant deviations, and computes metric correlations.

Usage

Generating Synthetic Data

Biased Data (full bias):
```
python biased_data_generator.py
```
This will generate synthetic_fairness_data_biased.csv.
Balanced Data (neutral baseline):
```
python balanced_data_generator.py
```
This will generate synthetic_fairness_data_neutral.csv.
Alert-only Bias (NEW):
```
python alert_only_bias_generator.py
```
This will generate synthetic_fairness_data_alert_only.csv. Only alert generation rates are platform-dependent; confirmation rates and scores remain uniform. This scenario isolates the effect on Ï†_ind.
Calibration-only Bias (NEW):
```
python calibration_only_bias_generator.py
```
This will generate synthetic_fairness_data_calibration_only.csv. Alert rates are uniform; confirmation rates and score drift vary by platform. This scenario isolates the effect on Î´_cal.

Calculating Fairness Metrics

Run the metrics_calculator.py script to calculate and compare fairness metrics for the neutral and biased datasets (original 2-scenario comparison):

python metrics_calculator.py

Metric Isolation Analysis (4-scenario comparison) (NEW)

Run the metrics_calculator_extended.py script to calculate fairness metrics across all four scenarios and generate comparison plots:

python metrics_calculator_extended.py

This script:

Loads all four synthetic datasets (neutral, alert-only bias, calibration-only bias, full bias).
Computes Ï†_ind, Ï†_sep, and Î´_cal for each scenario and platform/score group.
Generates a summary table: metric_table_all_scenarios.csv.
Produces three comparison plots:
- phi_ind_all_scenarios.png: Operational Independence across all scenarios.
- phi_sep_all_scenarios.png: Detection Separation across all scenarios.
- delta_cal_all_scenarios.png: Calibration Sufficiency across all scenarios.

The metric isolation analysis verifies that each proposed metric responds selectively to its intended fairness dimension:

φ_ind* activates under alert-only bias (IoT reaches 2.37) but remains near baseline (~1.0) under calibration-only bias.
φ_sep activates under calibration-only bias (values 0.18â€“0.22) but remains near baseline (~0.02) under alert-only bias.
δ_cal* shows expected cross-sensitivity under alert-only bias, consistent with the operational semantics of detection quality.

Temporal Analysis

Run the temporal_analysis.py script to analyze the temporal evolution of fairness metrics:

python temporal_analysis.py

Processing Real-World Data (UNSW Dataset)

This script processes network traffic data from the UNSW dataset and computes fairness metrics across different protocols. Ensure the dataset is available locally. Update the data_dir variable in realworld.py to point to the directory containing the .csv files.

Then run:

python realworld.py

The script will:

Load selected columns from all .csv files in the directory.

Preprocess and clean the data.

Generate fairness metrics:

φ_ind: Alert disparity by protocol.
φ_sep: F1-score by protocol.
δ_cal: Calibration gap by score.

Save results as .csv and .png files:

phi_ind_bootstrap.csv, phi_ind_raw.csv, phi_ind_unsw.png
phi_sep_bootstrap.csv, phi_sep_raw.csv, phi_sep_unsw.png
delta_cal.csv, delta_cal_unsw.png

To ensure the robustness of the fairness metrics, the script uses bootstrapping, a statistical resampling technique. Each metric is computed multiple times (e.g., 1000 iterations) on randomly sampled subsets of the data (with replacement). This allows the estimation of:

Mean: The average value of the metric across all resamples.

Standard deviation (std): A measure of variability, indicating how much the metric fluctuates across different samples.

This approach provides confidence intervals and helps assess the stability and reliability of the fairness metrics across different groups (e.g., protocols).

Verifying Dataset Size (NEW)

To independently verify the number of records used in the analysis (as reported in the paper), run:

python getN.py

This script applies the same loading and preprocessing pipeline as realworld.py and reports:

N raw: Total records loaded from all CSV partitions.
N after filtering: Records retained after removing rows with missing values in dur, Spkts, proto, or Stime.

Expected output: N = 2,540,047 (no records lost during filtering).

Statistical Significance Testing

After generating the fairness metrics, you can run the following script to test whether the observed differences are statistically significant. Run:

python test_hip.py

This script performs Mann-Whitney U tests on:

φ_sep: Compares detection quality (F1-score) between ospf and tcp.
φ_ind: Compares alert distribution between tcp and ospf.
δ_cal: Compares calibration consistency between the lowest and highest score levels.

The output includes descriptive statistics, p-values, and rank-biserial correlation as effect size (UPDATED) to assess both the statistical significance and practical magnitude of the observed differences. Effect sizes are interpreted as: |r| < 0.1 negligible, < 0.3 small, < 0.5 medium, â‰¥ 0.5 large. Results are saved to hypothesis_test_results.csv.

Classical Fairness Metrics

To compute classical fairness metrics such as Statistical Parity Difference (SPD) and Equal Opportunity Difference (EOD), run:

python classic_metrics.py

This script loads and preprocesses the UNSW dataset, and then calculates:

SPD: Difference in alert rates between each protocol and the reference group (tcp).
EOD: Difference in true positive rates (recall) between each protocol and tcp.

and saves the results in:

fairness_classic_metrics.csv: A CSV summary of SPD and EOD by platform.
fairness_classic_metrics.tex: A LaTeX-formatted table for inclusion in reports or publications.

Fairness Metric Compatibility Analysis

To analyze how new operational fairness metrics behave jointly across different platforms, run:

python compatibility.py

This script:

Loads and preprocesses the UNSW dataset.

Computes three fairness metrics per platform:

φ_ind: Alert rate relative to platform prevalence.
φ_sep: F1-score of alert vs. confirmation.
δ_cal: Calibration deviation across score levels.

Flags platforms where metrics exceed empirical thresholds.

Computes correlations between the metrics.

Saves the results in:

fairness_metric_compatibility.csv: CSV with metric values and activation flags.
fairness_metric_compatibility.tex: LaTeX-formatted table for reports.

Dependencies

Make sure you have the following Python libraries installed:

pandas
numpy
matplotlib
seaborn
scikit-learn
scipy

You can install them using pip:

pip install pandas numpy matplotlib seaborn scikit-learn scipy

Acknowledgments

The UNSW dataset was created by the Cyber Range Lab at UNSW Canberra and is provided for academic use by Nour Moustafa and Jill Slay.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fairness Metrics and Analysis

Directory Structure

Usage

Generating Synthetic Data

Calculating Fairness Metrics

Metric Isolation Analysis (4-scenario comparison) (NEW)

Temporal Analysis

Processing Real-World Data (UNSW Dataset)

Verifying Dataset Size (NEW)

Statistical Significance Testing

Classical Fairness Metrics

Fairness Metric Compatibility Analysis

Dependencies

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
alert_only_bias_generator.py		alert_only_bias_generator.py
balanced_data_generator.py		balanced_data_generator.py
biased_data_generator.py		biased_data_generator.py
calibration_only_bias_generator.py		calibration_only_bias_generator.py
classic_metrics.py		classic_metrics.py
compatibility.py		compatibility.py
getN.py		getN.py
metrics_calculator.py		metrics_calculator.py
metrics_calculator_extended.py		metrics_calculator_extended.py
realworld.py		realworld.py
synthetic_fairness_data_biased.csv		synthetic_fairness_data_biased.csv
synthetic_fairness_data_neutral.csv		synthetic_fairness_data_neutral.csv
temporal_analysis.py		temporal_analysis.py
test_hip.py		test_hip.py

Folders and files

Latest commit

History

Repository files navigation

Fairness Metrics and Analysis

Directory Structure

Usage

Generating Synthetic Data

Calculating Fairness Metrics

Metric Isolation Analysis (4-scenario comparison) (NEW)

Temporal Analysis

Processing Real-World Data (UNSW Dataset)

Verifying Dataset Size (NEW)

Statistical Significance Testing

Classical Fairness Metrics

Fairness Metric Compatibility Analysis

Dependencies

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages