Can ML quantify social structure?
We present a multichannel daily life prediction framework trained on the UK Time Use Survey (UKDA-8741), comprising 6,896 diary-days across four channels: Activity (37 categories), Enjoyment (7 levels), Location (3 categories), and With Whom (7 categories). Each diary day is divided into 144 ten-minute slots, yielding ~914,000 sliding-window samples.
Using 13 models from Majority Vote to Transformer Encoder, we achieve 91.0–98.6% test accuracy. The accuracy gap between demographic groups—up to 5.26 pp by income—is a social science finding: predictability measures structural constraint. Temporal sequence order contributes 14–20 pp; static demographics contribute ≈0.
Deep models (LSTM, GRU, Transformer) provide +3.16 pp for high-income groups only—revealing behavioural complexity invisible to linear models. All group differences are confirmed by Bootstrap 95% CIs across 300+ experiments.
Six results that reframe the question
Predictability ceiling, not model floor
All 13 models converge within 0.2 pp on every channel. The bottleneck is the intrinsic regularity of behaviour, not model power.
Income is the strongest structural axis
Low-income: 91.1% vs High-income: 85.9% activity accuracy. A 5.26 pp gap driven by constrained life routines.
Deep models expose high-income complexity
Transformer outperforms SGD by +3.16 pp for high-income only. Low-income lift <0.03 pp.
Transition moments: universal blind spots
Error peaks at 16% at 08:00 and 14% at 18:00—exactly at work and domestic transitions. Night hours: <1% error.
Sequence order dominates all features
Order-preserved vs bag-of-words: 14–20 pp gap. Removing order collapses performance more than removing all static demographics combined.
Channels are self-contained worlds
Cross-channel prediction performs up to 71 pp worse than self-regression. Enjoyment is least predictable from behaviour.
Four channels, one framework
Test accuracy across 13 models. Each 10-minute slot predicted from preceding window (W=1, optimal).
| Model | Activity | Enjoyment | Location | With Whom |
|---|
When does the model fail?
Errors concentrate at behavioural transitions — morning routines (08:00), lunch (12:00), and the work-to-home shift (17–18:00). Night hours approach 0% error.
How does a day unfold?
Each column is one hour. Each row is a demographic profile. Colour shows the dominant activity at that hour — revealing how structured constraint shapes daily time.
How much is true prediction, and how much is inertia?
Across UK and US runs, a simple persistence rule (“next slot = previous slot”) is extremely strong. This reframes the headline: high accuracy often reflects behavioural inertia rather than rich anticipatory forecasting.
Activity: SGD under persistence
SGD trails a one-step copy baseline by 2.28 pp on US activity forecasting.
Location: model slightly helps
For coarse and sticky channels like location, model features can still add a small gain.
Predictability as social constraint
If a group is highly predictable mainly because today repeats yesterday, this reflects institutionalized time: fixed work schedules, constrained options, and role-bound routines.
Income group accuracy: non-overlapping CIs confirm significance
The income gap replicates across two countries
UKDA-8741 (UK, 2014–2015) and ATUS 2024 (US) show structurally identical patterns. Income stratification is a durable, cross-national social fact.
US expansion: all experiments complete
Phase 2 report finalized: 389 lines, 11 chapters, 18 key findings, 127 data table rows. Covers ATUS 2024 and pooled 2003–2024 cross-year analysis. Report is advisor-ready.
US loader stabilization
Pooled + 2024 schema compatibility validated. Pipeline is production-ready for iterative batches.
US baseline + grouped activity
Activity forecasting by income / employment / sex complete. Transformer consistently beats SGD across all US subgroups.
Fine vs coarse activity
Coarse grouping lifts model accuracy from 87.72% to 90.11% (+2.4 pp) — but fine grain captures genuine behavioural complexity.
10-min vs 30-min timescale
30-min resampling loses 14 pp accuracy vs 10-min. Finer resolution is decisively better; behavioural signal degrades rapidly with temporal aggregation.
Weekday-only proxy test
Weekday-only sampling loses only 0.8 pp vs full-week. Model generalises robustly — structural patterns dominate over day-type variation.
Cross-national feasibility
MTUS onboarding and country comparability diagnostics are the next execution frontier.
Full-scale A1 runs (SGD + Transformer)
Full-sample SGD and Transformer complete. Transformer reaches persistence ceiling on activity. Both models validated across all channels.
Pooled 2003–2024 ATUS analysis
20-year pooled analysis confirms temporal stability. Behavioural predictability is a durable social fact — not an artefact of recent data.
Pooled data: model beats persistence
With 140,000+ training windows, Transformer exceeds the persistence baseline by +0.2–0.5 pp across income, age, and weekday groups. The apparent “unbeatable persistence” in small samples is a methodological artefact of underfitting.
Persistence ceiling identical across UK & US
Both UK (UKDA-8741) and US (ATUS) yield 88.76% persistence accuracy on activity. Short-term temporal inertia is a universal human feature, independent of institutional context.
What the US data confirms — and what replicates across UK & US
Predictability maps social structure
Accuracy gaps between demographic groups are social science findings. Bootstrap 95% CIs confirm all differences are significant.
Predict a life
Choose a demographic profile. The model forecasts a complete 24-hour day — slot by slot, the same way the Transformer does it.
2,000 simulated lives — a city in motion
Each dot is one simulated life. Coloured zones are city districts. People flow between districts as the day progresses — driven by the same structural patterns our model learns to predict.
Framework design
Data pipeline
UKDA-8741 Stata .dta → wide-to-long → sliding windows. Person-level 70/15/15 split by mainid. No cross-person leakage.
Sliding windows
W consecutive slots → predict slot W+1. W=1 (10 min) optimal. ~914K samples total across four channels.
13 models
Majority, Naive Bayes, Markov (N-gram), Logistic, Ridge, SGD, Random Forest, GBDT, XGBoost, LightGBM, LSTM, GRU, Transformer Encoder.
Evaluation
Accuracy, Macro-F1, MAE, QWK, Within-one. Bootstrap 95% CI (n=1000 resamples). Brier Score 0.0849, LCS 0.9921.
Group experiments
Per-group train/eval for income, econstat, sex, age, region, survey period. 198 B-class experiments. Controlled for group size.
Ablation design
Order vs bag encoding, window length (W=1–30), static features individually, cross-channel inputs. Each factor isolated independently.
From predictive metrics to social theory
This project interprets forecasting performance through sociology of time, institutional constraint, and stratified agency.
Pred, A. (1981/2005 reprint). Time-geography and the social anchoring of everyday practices. Used here for the concept of daily routines as space-time constraints.
Weber, M. (1905; 1922). Rationalization and the “iron cage”. Used to interpret fixed work-time systems as institutionalized temporal discipline.
Working Time Mismatch and Employee Subjective Well-being across Institutional Contexts (job-quality perspective). Supports the institutional-context lens for cross-national expansion.
Hochschild, A. (1989). The second shift. Supports gendered role-load interpretation for fragmented daily schedules.
Peterson, R. (1992). Cultural omnivore thesis. Inspires the hypothesis of “omnivore schedules” in higher-income groups.
Method note. At this phase, missing-value handling follows advisor guidance: drop-first strategy for core variables before imputation extensions.