Research · 2026 · UKDA-8741 · Phase 2 Complete

Predictability as a Social Fact

A Multichannel Framework for Daily Life Forecasting

We train predictive models across four channels of daily life on 914,000 time-use windows from the UK. The accuracy gap between demographic groups quantifies structural constraint.

914Kwindows
300+experiments
13models
4channels
2countries
18US findings
Employed · Low income · Weekday
00h06h12h18h24h
914Ktraining windows
91–98.6%test accuracy range
5.26 ppincome accuracy gap
14–20 ppsequence order effect
300+bootstrap experiments
Abstract

Can ML quantify social structure?

We present a multichannel daily life prediction framework trained on the UK Time Use Survey (UKDA-8741), comprising 6,896 diary-days across four channels: Activity (37 categories), Enjoyment (7 levels), Location (3 categories), and With Whom (7 categories). Each diary day is divided into 144 ten-minute slots, yielding ~914,000 sliding-window samples.

Using 13 models from Majority Vote to Transformer Encoder, we achieve 91.0–98.6% test accuracy. The accuracy gap between demographic groups—up to 5.26 pp by income—is a social science finding: predictability measures structural constraint. Temporal sequence order contributes 14–20 pp; static demographics contribute ≈0.

Deep models (LSTM, GRU, Transformer) provide +3.16 pp for high-income groups only—revealing behavioural complexity invisible to linear models. All group differences are confirmed by Bootstrap 95% CIs across 300+ experiments.

Time-use surveysSequence predictionSocial stratificationTransformerUKDA-8741
Key Findings

Six results that reframe the question

F1
📐

Predictability ceiling, not model floor

All 13 models converge within 0.2 pp on every channel. The bottleneck is the intrinsic regularity of behaviour, not model power.

E1 — Model comparison
F2
📊

Income is the strongest structural axis

Low-income: 91.1% vs High-income: 85.9% activity accuracy. A 5.26 pp gap driven by constrained life routines.

B1 — Income stratification
F3
🧠

Deep models expose high-income complexity

Transformer outperforms SGD by +3.16 pp for high-income only. Low-income lift <0.03 pp.

Deep models — B-class
F4

Transition moments: universal blind spots

Error peaks at 16% at 08:00 and 14% at 18:00—exactly at work and domestic transitions. Night hours: <1% error.

Error analysis — hourly
F5
🔢

Sequence order dominates all features

Order-preserved vs bag-of-words: 14–20 pp gap. Removing order collapses performance more than removing all static demographics combined.

E6 — Ablation
F6
🌐

Channels are self-contained worlds

Cross-channel prediction performs up to 71 pp worse than self-regression. Enjoyment is least predictable from behaviour.

E2 — Cross-channel
Results — Channel Performance

Four channels, one framework

Test accuracy across 13 models. Each 10-minute slot predicted from preceding window (W=1, optimal).

All models × all channels — Test Accuracy (%)
Model performance: test accuracy (%) across Activity, Enjoyment, Location, and With Whom channels
ModelActivityEnjoymentLocationWith Whom
Error Analysis

When does the model fail?

Errors concentrate at behavioural transitions — morning routines (08:00), lunch (12:00), and the work-to-home shift (17–18:00). Night hours approach 0% error.

24h Activity Patterns

How does a day unfold?

Each column is one hour. Each row is a demographic profile. Colour shows the dominant activity at that hour — revealing how structured constraint shapes daily time.

Core Mechanism

How much is true prediction, and how much is inertia?

Across UK and US runs, a simple persistence rule (“next slot = previous slot”) is extremely strong. This reframes the headline: high accuracy often reflects behavioural inertia rather than rich anticipatory forecasting.

88.76% Persistence (UK & US activity) identical in both countries
88.56% Transformer (US activity) Δ = −0.20 pp vs persistence
+1.57 pp Model beats persistence US location channel
14–20 pp Sequence order contribution UK & US ablation
US 2024 (Quick A1)

Activity: SGD under persistence

86.48%SGD
vs
88.76%Persistence

SGD trails a one-step copy baseline by 2.28 pp on US activity forecasting.

US 2024 (Quick A1)

Location: model slightly helps

94.40%SGD
vs
92.84%Persistence

For coarse and sticky channels like location, model features can still add a small gain.

Interpretation

Predictability as social constraint

If a group is highly predictable mainly because today repeats yesterday, this reflects institutionalized time: fixed work schedules, constrained options, and role-bound routines.

InertiaInstitutional timeStructure & agency
Bootstrap 95% Confidence Intervals

Income group accuracy: non-overlapping CIs confirm significance

Cross-National Comparison — UK & US

The income gap replicates across two countries

UKDA-8741 (UK, 2014–2015) and ATUS 2024 (US) show structurally identical patterns. Income stratification is a durable, cross-national social fact.

🇬🇧 Activity accuracy by income — UK
🇺🇸 Activity accuracy by income — US
F7Income gap replicates: UK 5.26 pp gap — US 6.58 pp gap (Transformer, single-year); pooled gap narrows to 2.17 pp as sample size equalizes group fit. Same direction, both countries.
F8Persistence baseline identical: UK 88.76% = US 88.76%. Short-term inertia is a universal human feature.
F9Pooled data (2003–2024): model beats persistence (+0.2–0.5 pp). Small-sample “unbeatable persistence” is a methodological artefact, not a true ceiling.
F10Gender effects depend on sample and setup (UK 0.46 pp; US pooled activity gap 3.30 pp), but they collapse at full scale. Full ATUS data: male = female = 87.51% — gap vanishes entirely. Economic status — not gender — is the true structural determinant of behavioral regularity.
Phase 2 — Complete

US expansion: all experiments complete

Phase 2 report finalized: 389 lines, 11 chapters, 18 key findings, 127 data table rows. Covers ATUS 2024 and pooled 2003–2024 cross-year analysis. Report is advisor-ready.

CompletedWP1

US loader stabilization

Pooled + 2024 schema compatibility validated. Pipeline is production-ready for iterative batches.

CompletedWP2/WP4

US baseline + grouped activity

Activity forecasting by income / employment / sex complete. Transformer consistently beats SGD across all US subgroups.

CompletedC1

Fine vs coarse activity

Coarse grouping lifts model accuracy from 87.72% to 90.11% (+2.4 pp) — but fine grain captures genuine behavioural complexity.

CompletedD1

10-min vs 30-min timescale

30-min resampling loses 14 pp accuracy vs 10-min. Finer resolution is decisively better; behavioural signal degrades rapidly with temporal aggregation.

CompletedE1

Weekday-only proxy test

Weekday-only sampling loses only 0.8 pp vs full-week. Model generalises robustly — structural patterns dominate over day-type variation.

NextWP4+

Cross-national feasibility

MTUS onboarding and country comparability diagnostics are the next execution frontier.

CompletedA1-full

Full-scale A1 runs (SGD + Transformer)

Full-sample SGD and Transformer complete. Transformer reaches persistence ceiling on activity. Both models validated across all channels.

CompletedB1-pool

Pooled 2003–2024 ATUS analysis

20-year pooled analysis confirms temporal stability. Behavioural predictability is a durable social fact — not an artefact of recent data.

New FindingB1-pool

Pooled data: model beats persistence

With 140,000+ training windows, Transformer exceeds the persistence baseline by +0.2–0.5 pp across income, age, and weekday groups. The apparent “unbeatable persistence” in small samples is a methodological artefact of underfitting.

New FindingCross-national

Persistence ceiling identical across UK & US

Both UK (UKDA-8741) and US (ATUS) yield 88.76% persistence accuracy on activity. Short-term temporal inertia is a universal human feature, independent of institutional context.

US Phase 2 — Nine Core Findings

What the US data confirms — and what replicates across UK & US

Group Differences

Predictability maps social structure

Accuracy gaps between demographic groups are social science findings. Bootstrap 95% CIs confirm all differences are significant.

Interactive Demo

Predict a life

Choose a demographic profile. The model forecasts a complete 24-hour day — slot by slot, the same way the Transformer does it.

Income
Employment
Gender
Day type
Select a profile and run…
00h06h12h18h24h
Predicted accuracy
Slots predicted
Distinct activities
Population Simulation

2,000 simulated lives — a city in motion

00:00
🌙 Night
Speed
Mode

Each dot is one simulated life. Coloured zones are city districts. People flow between districts as the day progresses — driven by the same structural patterns our model learns to predict.

00:0006:0012:0018:0024:00
Methods

Framework design

01

Data pipeline

UKDA-8741 Stata .dta → wide-to-long → sliding windows. Person-level 70/15/15 split by mainid. No cross-person leakage.

02

Sliding windows

W consecutive slots → predict slot W+1. W=1 (10 min) optimal. ~914K samples total across four channels.

03

13 models

Majority, Naive Bayes, Markov (N-gram), Logistic, Ridge, SGD, Random Forest, GBDT, XGBoost, LightGBM, LSTM, GRU, Transformer Encoder.

04

Evaluation

Accuracy, Macro-F1, MAE, QWK, Within-one. Bootstrap 95% CI (n=1000 resamples). Brier Score 0.0849, LCS 0.9921.

05

Group experiments

Per-group train/eval for income, econstat, sex, age, region, survey period. 198 B-class experiments. Controlled for group size.

06

Ablation design

Order vs bag encoding, window length (W=1–30), static features individually, cross-channel inputs. Each factor isolated independently.

References & Theoretical Anchors

From predictive metrics to social theory

This project interprets forecasting performance through sociology of time, institutional constraint, and stratified agency.

R1

Pred, A. (1981/2005 reprint). Time-geography and the social anchoring of everyday practices. Used here for the concept of daily routines as space-time constraints.

R2

Weber, M. (1905; 1922). Rationalization and the “iron cage”. Used to interpret fixed work-time systems as institutionalized temporal discipline.

R3

Working Time Mismatch and Employee Subjective Well-being across Institutional Contexts (job-quality perspective). Supports the institutional-context lens for cross-national expansion.

R4

Hochschild, A. (1989). The second shift. Supports gendered role-load interpretation for fragmented daily schedules.

R5

Peterson, R. (1992). Cultural omnivore thesis. Inspires the hypothesis of “omnivore schedules” in higher-income groups.

R6

Method note. At this phase, missing-value handling follows advisor guidance: drop-first strategy for core variables before imputation extensions.