Research · 2026 · UKDA-8741 + ATUS + MTUS · Phase 4+ Delivery Live

Predictability as a Social Fact

A four-layer evidence and delivery program on daily life predictability

From UK framework construction to US replication, MTUS external validity, and a current boundary-aware delivery package, LifeCast now exposes both the experimental arc and the paper-ready reporting layer.

914Kwindows
921result JSONs
68surfaced figures
9national settings
31Mtest windows
9report sets
Employed · Low income · Weekday
00h06h12h18h24h
914Ktraining windows
5.26 ppUK income gap
+0.2–0.5 ppUS pooled delta
+0.44 ppMTUS TF delta
68surfaced figures
Abstract

Can ML quantify social structure?

LifeCast begins as a multichannel daily life prediction framework trained on the UK Time Use Survey (UKDA-8741), covering 6,896 diary-days and four channels: Activity, Enjoyment, Location, and With Whom. Phase 1 establishes the theoretical core: predictability is not just a machine-learning score, but a way to quantify social constraint.

Phase 2 carries the design into the US ATUS ecosystem, where the main question shifts from “can the framework travel?” to “what happens once persistence is treated seriously?” The answer is more mature than a simple win/loss story: quick runs can sit below persistence, but pooled 2003–2024 data recovers stable positive deltas once sample size is large enough.

Phase 3 closes the loop through MTUS external-validity testing across seven countries. In the refreshed full summary, Transformer remains positive in 7/7 countries with a weighted +0.44 pp gain over persistence, while fine activity remains more informative than coarse coding. The site now reflects that full project arc rather than a Phase 1 snapshot.

Phase 4+ is the delivery layer built on top of those results. It packages the current evidence surface into a live report bundle, image atlas, and website sync while keeping the key boundary sentence explicit: default delivery is ready, but heavier experiments stay closed unless a stronger claim is actually needed.

UKDA-8741ATUSMTUSPhase 4+Evidence gate
Research Program

Four layers, one cumulative argument

The site now follows the full arc of the project: framework construction in the UK, replication in the US, external-validity testing across MTUS countries, and a current Phase 4+ delivery layer that packages the live paper-ready surface.

Why this matters. The project is no longer just a promising framework; it is now a multi-layer evidence stack with figure chains, delivery logic, and current handoff materials that match the live summaries.
Phase 4+

Current delivery layer, package surface, and evidence gate

This section is the current status surface rather than a historical archive. It shows what is actually ready now, what is intentionally boundary-written, and which current package files should be read first.

Project Showcase

Representative figures and phase-scale evidence

The site now exposes the project at two levels at once: how much evidence each layer carries, and which figures best represent both the historical argument and the current delivery package without forcing the reader to download a report first.

Why this section exists. The project used to look lighter than it really was because the public surface only showed the final report cards. These panels make the evidence scale and representative figures visible directly on the site.
Key Findings

Six results that reframe the question

F1
📐

Predictability ceiling, not model floor

All 13 models converge within 0.2 pp on every channel. The bottleneck is the intrinsic regularity of behaviour, not model power.

E1 — Model comparison
F2
📊

Income is the strongest structural axis

Low-income: 91.1% vs High-income: 85.9% activity accuracy. A 5.26 pp gap driven by constrained life routines.

B1 — Income stratification
F3
🧠

Deep models expose high-income complexity

Transformer outperforms SGD by +3.16 pp for high-income only. Low-income lift <0.03 pp.

Deep models — B-class
F4

Transition moments: universal blind spots

Error peaks at 16% at 08:00 and 14% at 18:00—exactly at work and domestic transitions. Night hours: <1% error.

Error analysis — hourly
F5
🔢

Sequence order dominates all features

Order-preserved vs bag-of-words: 14–20 pp gap. Removing order collapses performance more than removing all static demographics combined.

E6 — Ablation
F6
🌍

External validity survives heterogeneity

In MTUS full runs, Transformer stays above persistence in 7/7 countries with a weighted +0.44 pp gain. Fine activity carries more signal than coarse coding.

Phase 3 — External validity
Results — Channel Performance

Four channels, one framework

Test accuracy across 13 models. Each 10-minute slot predicted from preceding window (W=1, optimal).

All models × all channels — Test Accuracy (%)
Model performance: test accuracy (%) across Activity, Enjoyment, Location, and With Whom channels
ModelActivityEnjoymentLocationWith Whom
Error Analysis

When does the model fail?

Errors concentrate at behavioural transitions — morning routines (08:00), lunch (12:00), and the work-to-home shift (17–18:00). Night hours approach 0% error.

24h Activity Patterns

How does a day unfold?

Each column is one hour. Each row is a demographic profile. Colour shows the dominant activity at that hour — revealing how structured constraint shapes daily time.

Core Mechanism

How much is true prediction, and how much is inertia?

Across UK and US runs, a simple persistence rule (“next slot = previous slot”) is extremely strong. This reframes the headline: high accuracy often reflects behavioural inertia rather than rich anticipatory forecasting.

88.76% Persistence (UK & US activity) identical in both countries
88.56% Transformer (US activity) Δ = −0.20 pp vs persistence
+1.57 pp Model beats persistence US location channel
14–20 pp Sequence order contribution UK & US ablation
US 2024 (Quick A1)

Activity: SGD under persistence

86.48%SGD
vs
88.76%Persistence

SGD trails a one-step copy baseline by 2.28 pp on US activity forecasting.

US 2024 (Quick A1)

Location: model slightly helps

94.40%SGD
vs
92.84%Persistence

For coarse and sticky channels like location, model features can still add a small gain.

Interpretation

Predictability as social constraint

If a group is highly predictable mainly because today repeats yesterday, this reflects institutionalized time: fixed work schedules, constrained options, and role-bound routines.

InertiaInstitutional timeStructure & agency
Bootstrap 95% Confidence Intervals

Income group accuracy: non-overlapping CIs confirm significance

Cross-National Comparison — UK & US

The income gap replicates across two countries

UKDA-8741 (UK, 2014–2015) and ATUS 2024 (US) show structurally identical patterns. Income stratification is a durable, cross-national social fact.

🇬🇧 Activity accuracy by income — UK
🇺🇸 Activity accuracy by income — US
F7Income gap replicates: UK 5.26 pp gap — US 6.58 pp gap (Transformer, single-year); pooled gap narrows to 2.17 pp as sample size equalizes group fit. Same direction, both countries.
F8Persistence baseline identical: UK 88.76% = US 88.76%. Short-term inertia is a universal human feature.
F9Pooled data (2003–2024): model beats persistence (+0.2–0.5 pp). Small-sample “unbeatable persistence” is a methodological artefact, not a true ceiling.
F10Gender effects depend on sample and setup (UK 0.46 pp; US pooled activity gap 3.30 pp), but they collapse at full scale. Full ATUS data: male = female = 87.51% — gap vanishes entirely. Economic status — not gender — is the true structural determinant of behavioral regularity.
Phase 2 — Complete

US expansion: all experiments complete

Phase 2 report finalized: 389 lines, 11 chapters, 18 key findings, 127 data table rows, and an 8-figure visualization appendix. It covers ATUS 2024 and pooled 2003–2024 cross-year analysis and is now fully advisor-ready.

CompletedWP1

US loader stabilization

Pooled + 2024 schema compatibility validated. Pipeline is production-ready for iterative batches.

CompletedWP2/WP4

US baseline + grouped activity

Activity forecasting by income / employment / sex complete. Transformer consistently beats SGD across all US subgroups.

CompletedC1

Fine vs coarse activity

Coarse grouping lifts model accuracy from 87.72% to 90.11% (+2.4 pp) — but fine grain captures genuine behavioural complexity.

CompletedD1

10-min vs 30-min timescale

30-min resampling loses 14 pp accuracy vs 10-min. Finer resolution is decisively better; behavioural signal degrades rapidly with temporal aggregation.

CompletedE1

Weekday-only proxy test

Weekday-only sampling loses only 0.8 pp vs full-week. Model generalises robustly — structural patterns dominate over day-type variation.

CompletedWP4+

Cross-national feasibility

MTUS onboarding and country comparability diagnostics were completed here, then carried forward into the full 7-country validation delivered in Phase 3.

CompletedA1-full

Full-scale A1 runs (SGD + Transformer)

Full-sample SGD and Transformer complete. Transformer reaches persistence ceiling on activity. Both models validated across all channels.

CompletedB1-pool

Pooled 2003–2024 ATUS analysis

20-year pooled analysis confirms temporal stability. Behavioural predictability is a durable social fact — not an artefact of recent data.

New FindingB1-pool

Pooled data: model beats persistence

With 140,000+ training windows, Transformer exceeds the persistence baseline by +0.2–0.5 pp across income, age, and weekday groups. The apparent “unbeatable persistence” in small samples is a methodological artefact of underfitting.

New FindingCross-national

Persistence ceiling identical across UK & US

Both UK (UKDA-8741) and US (ATUS) yield 88.76% persistence accuracy on activity. Short-term temporal inertia is a universal human feature, independent of institutional context.

US Phase 2 — Nine Core Findings

What the US data confirms — and what replicates across UK & US

Phase 3 — External Validity

MTUS full runs turn LifeCast into a cross-national result

Phase 3 is no longer a quick-status page. The refreshed summaries now include all 42 A1 fine records and 105 B1 grouped full runs across CA, ES, FR, IT, KR, NL, and ZA, supported by a dedicated 10-figure chain and a complete technical report.

Last updated: full summary + figures + advisor package sync · 2026
7MTUS countries
42A1 fine records
105B1 full records
31.4Mtest windows
+0.44ppTF vs persistence
10Phase 3 figures

Mainline decision

Keep: fine activity + Transformer as the formal default. In the refreshed full summary, Transformer is positive in all 7 countries.

Interpretation: persistence remains strong, but it is not the final ceiling once harmonization and sample size are handled carefully.

Grouped robustness

Keep: age_bin and sex as the stable Phase 3 grouped lanes. Both remain above zero in the full runs.

Why it matters: model lift is not confined to pooled totals; it survives social slices as well.

Fine vs coarse

Fine: weighted delta +0.44 pp. Coarse: +0.16 pp.

Coarse remains a robustness lane, but fine activity carries the richer signal and therefore stays on the main narrative path.

Country deltas

Transformer stays positive in all seven countries

Quick → full correction

Formal runs amplify the weak quick signals

Grouped heatmap

age_bin and sex stay positive country by country

中文:Phase 3 现在已经不是 quick 通报,而是完整的跨国外部效度结果:7 国 full runs、42 条 A1 fine 记录、105 条 B1 分组 full records 全部入表,且 Transformer 在 7 国均保持正增益。

EN: Phase 3 is now a full external-validity result, not a quick note: all 42 A1 fine records and 105 grouped full records are included, and Transformer remains positive in all seven MTUS countries.

Group Differences

Predictability maps social structure

Accuracy gaps between demographic groups are social science findings. Bootstrap 95% CIs confirm all differences are significant.

Interactive Demo

Predict a life

Choose a demographic profile. The model forecasts a complete 24-hour day — slot by slot, the same way the Transformer does it.

Income
Employment
Gender
Day type
Select a profile and run…
00h06h12h18h24h
Predicted accuracy
Slots predicted
Distinct activities
Population Simulation

2,000 simulated lives — a city in motion

00:00
🌙 Night
Speed
Mode

Each dot is one simulated life. Coloured zones are city districts. People flow between districts as the day progresses — driven by the same structural patterns our model learns to predict.

00:0006:0012:0018:0024:00
Methods

Framework, harmonization, and handoff

01

Data pipeline

UKDA-8741 Stata .dta → wide-to-long → sliding windows. Person-level 70/15/15 split by mainid. No cross-person leakage.

02

Sliding windows

W consecutive slots → predict slot W+1. W=1 (10 min) optimal. ~914K samples total across four channels.

03

13 models, then focused formal tracks

Phase 1 surveys the full model family. Later phases narrow the formal comparison to SGD, Transformer, and persistence-style baselines where the methodological question becomes most interpretable.

04

Evaluation

Accuracy, Macro-F1, delta vs persistence, weighted aggregation, MAE, QWK, and Bootstrap 95% CI. The project increasingly treats persistence as the benchmark that results must explain, not ignore.

05

Group experiments

Per-group train/eval spans income, econstat, sex, age, weekday structure, and later MTUS grouped slices. The grouped question stays constant across phases: who is more predictable, and why?

06

Ablation design

Order vs bag encoding, window length (W=1–30), fine vs coarse activity, weekday/full-week contrasts, and cross-channel inputs isolate where the predictive signal actually lives.

07

Cross-national harmonization

Phase 2 and Phase 3 align UKDA-8741, ATUS, and MTUS to a shared activity forecasting task. Fine activity stays on the mainline; coarse coding remains a robustness lane rather than the headline result.

08

Report packaging

Each mature layer is delivered as Markdown, self-contained HTML, and print-ready PDF. The site now carries both a current Phase 4+ package and a legacy Phase 1–3 archive, so the reading surface stays aligned with live results while older packages remain recoverable.

Research Package

Current Phase 4+ package, archive reports, and raw figures are assembled

The project now has two public report surfaces: a current Phase 4+ delivery package with its own overview, complete report, and visualization atlas, plus the earlier Phase 1–3 archive with the older overview and legacy phase reports. This is the curated reporting layer built on top of 921 result JSON artifacts rather than the full raw tree dumped verbatim.

1
Phase 4+ project overview current status ladder, package purpose, reading order, and recommended figures
2
Phase 4+ complete report current bridge rows, Group B boundaries, pooled language, MTUS sync, and evidence-gate logic
3
Phase 4+ visualization atlas all current package figures, their roles, and the raw source files used
4
Current package indexes package index, figure index, and manifest for later checking
5
Phase 1–3 archive overview older three-phase reading layer preserved as an archive entry point
6
Archive phase reports and raw bundles legacy phase reports remain downloadable, but they now sit behind the current package rather than in front of it
Packaging status. Markdown remains the editable source, HTML provides offline review, PDF gives a stable handoff, the current Phase 4+ package is the default entry point, and the earlier three-phase package is preserved as an archive rather than the default surface. The visible report cards are intentionally the final evidence layer, while the underlying project still spans 921 result JSON artifacts.
References & Theoretical Anchors

From predictive metrics to social theory

This project interprets forecasting performance through sociology of time, institutional constraint, and stratified agency.

R1

Pred, A. (1981/2005 reprint). Time-geography and the social anchoring of everyday practices. Used here for the concept of daily routines as space-time constraints.

R2

Weber, M. (1905; 1922). Rationalization and the “iron cage”. Used to interpret fixed work-time systems as institutionalized temporal discipline.

R3

Working Time Mismatch and Employee Subjective Well-being across Institutional Contexts (job-quality perspective). Supports the institutional-context lens for cross-national expansion.

R4

Hochschild, A. (1989). The second shift. Supports gendered role-load interpretation for fragmented daily schedules.

R5

Peterson, R. (1992). Cultural omnivore thesis. Inspires the hypothesis of “omnivore schedules” in higher-income groups.

R6

Method note. At this phase, missing-value handling follows advisor guidance: drop-first strategy for core variables before imputation extensions.