How-To

How to Prepare Your Hiring Data for a Bias Audit

Step-by-step guide to collecting, formatting, and uploading hiring data for an AI bias audit.

OnHirely TeamNovember 1, 202512 min read

Introduction

The quality of your bias audit depends on the quality of your data. This guide walks you through preparing your hiring data for analysis with OnHirely.

What Data Do You Need?

Required Fields

  • Candidate identifier: Anonymized ID (never names or SSNs)
  • Decision outcome: Whether the candidate was selected/advanced at each stage
  • Demographic data: Race/ethnicity, sex/gender (required by LL144)
  • AI score or rating: If your tool produces numerical scores
  • Pipeline stage: Which stage the decision occurred at
  • Date: When the decision was made
  • Age group: For age discrimination analysis
  • Job category: To analyze bias by role type

Data Collection Methods

Method 1: Historical Data

Export hiring data from your ATS for the past 12 months. This is the most common approach and provides real-world results.

Method 2: Test Data

If historical data isn't available, some auditors accept test data generated to simulate a representative applicant pool.

Method 3: Combined Approach

Use historical data supplemented with test data for demographic categories with small sample sizes.

Formatting Requirements

File Format

OnHirely accepts CSV and Excel (.xlsx) files. We recommend CSV for simplicity.

Column Headers

Use clear, descriptive column headers. OnHirely's AI will auto-map common column names, but clear headers help.

Demographic Categories

Race/Ethnicity (LL144 categories)

  • Hispanic or Latino
  • White (not Hispanic or Latino)
  • Black or African American
  • Native Hawaiian or Other Pacific Islander
  • Asian
  • American Indian or Alaska Native
  • Two or More Races

Sex/Gender

  • Male
  • Female
  • Non-binary (recommended to include)

Data Quality Checklist

  1. No personally identifiable information (names, SSNs, email addresses)
  2. Consistent formatting within each column
  3. No empty rows or columns in the middle of data
  4. Demographic fields populated for at least 70% of candidates
  5. Minimum 30 candidates per demographic group for reliable analysis
  6. At least 6 months of data (12 months preferred)

Common Data Issues and Solutions

Missing Demographics

If demographic data is incomplete, consider supplementing with self-reported survey data or BISG (Bayesian Improved Surname Geocoding) estimation.

Small Sample Sizes

For groups with fewer than 30 candidates, OnHirely automatically switches to Fisher's exact test. Consider expanding the time window to increase sample sizes.

Multiple Decision Points

If your pipeline has multiple AI-influenced stages (screening, assessment, interview recommendation), include outcome data for each stage separately.

Uploading to OnHirely

  1. Log in to your OnHirely dashboard
  2. Click "New Audit"
  3. Upload your CSV or Excel file
  4. Review the auto-mapped column assignments
  5. Confirm demographic categories
  6. Start the audit

OnHirely processes your data locally and generates results in minutes. Your data is encrypted in transit and at rest.

Last updated: February 15, 2026

Related Guides

Ready to Audit Your AI Hiring Tools?

Get your compliance report in minutes. No consulting engagement needed.

Start Your Free Audit