Introduction
The quality of your bias audit depends on the quality of your data. This guide walks you through preparing your hiring data for analysis with OnHirely.
What Data Do You Need?
Required Fields
- Candidate identifier: Anonymized ID (never names or SSNs)
- Decision outcome: Whether the candidate was selected/advanced at each stage
- Demographic data: Race/ethnicity, sex/gender (required by LL144)
Recommended Fields
- AI score or rating: If your tool produces numerical scores
- Pipeline stage: Which stage the decision occurred at
- Date: When the decision was made
- Age group: For age discrimination analysis
- Job category: To analyze bias by role type
Data Collection Methods
Method 1: Historical Data
Export hiring data from your ATS for the past 12 months. This is the most common approach and provides real-world results.
Method 2: Test Data
If historical data isn't available, some auditors accept test data generated to simulate a representative applicant pool.
Method 3: Combined Approach
Use historical data supplemented with test data for demographic categories with small sample sizes.
Formatting Requirements
File Format
OnHirely accepts CSV and Excel (.xlsx) files. We recommend CSV for simplicity.
Column Headers
Use clear, descriptive column headers. OnHirely's AI will auto-map common column names, but clear headers help.
Demographic Categories
Race/Ethnicity (LL144 categories)
- Hispanic or Latino
- White (not Hispanic or Latino)
- Black or African American
- Native Hawaiian or Other Pacific Islander
- Asian
- American Indian or Alaska Native
- Two or More Races
Sex/Gender
- Male
- Female
- Non-binary (recommended to include)
Data Quality Checklist
- No personally identifiable information (names, SSNs, email addresses)
- Consistent formatting within each column
- No empty rows or columns in the middle of data
- Demographic fields populated for at least 70% of candidates
- Minimum 30 candidates per demographic group for reliable analysis
- At least 6 months of data (12 months preferred)
Common Data Issues and Solutions
Missing Demographics
If demographic data is incomplete, consider supplementing with self-reported survey data or BISG (Bayesian Improved Surname Geocoding) estimation.
Small Sample Sizes
For groups with fewer than 30 candidates, OnHirely automatically switches to Fisher's exact test. Consider expanding the time window to increase sample sizes.
Multiple Decision Points
If your pipeline has multiple AI-influenced stages (screening, assessment, interview recommendation), include outcome data for each stage separately.
Uploading to OnHirely
- Log in to your OnHirely dashboard
- Click "New Audit"
- Upload your CSV or Excel file
- Review the auto-mapped column assignments
- Confirm demographic categories
- Start the audit
OnHirely processes your data locally and generates results in minutes. Your data is encrypted in transit and at rest.