Can ML quantify social structure?
LifeCast begins as a multichannel daily life prediction framework trained on the UK Time Use Survey (UKDA-8741), covering 6,896 diary-days and four channels: Activity, Enjoyment, Location, and With Whom. Phase 1 establishes the theoretical core: predictability is not just a machine-learning score, but a way to quantify social constraint.
Phase 2 carries the design into the US ATUS ecosystem, where the main question shifts from “can the framework travel?” to “what happens once persistence is treated seriously?” The answer is more mature than a simple win/loss story: quick runs can sit below persistence, but pooled 2003–2024 data recovers stable positive deltas once sample size is large enough.
Phase 3 closes the loop through MTUS external-validity testing across seven countries. In the refreshed full summary, Transformer remains positive in 7/7 countries with a weighted +0.44 pp gain over persistence, while fine activity remains more informative than coarse coding. The site now reflects that full project arc rather than a Phase 1 snapshot.
Phase 4+ is the delivery layer built on top of those results. It packages the current evidence surface into a live report bundle, image atlas, and website sync while keeping the key boundary sentence explicit: default delivery is ready, but heavier experiments stay closed unless a stronger claim is actually needed.
Four layers, one cumulative argument
The site now follows the full arc of the project: framework construction in the UK, replication in the US, external-validity testing across MTUS countries, and a current Phase 4+ delivery layer that packages the live paper-ready surface.
Current delivery layer, package surface, and evidence gate
This section is the current status surface rather than a historical archive. It shows what is actually ready now, what is intentionally boundary-written, and which current package files should be read first.
Representative figures and phase-scale evidence
The site now exposes the project at two levels at once: how much evidence each layer carries, and which figures best represent both the historical argument and the current delivery package without forcing the reader to download a report first.
Six results that reframe the question
Predictability ceiling, not model floor
All 13 models converge within 0.2 pp on every channel. The bottleneck is the intrinsic regularity of behaviour, not model power.
Income is the strongest structural axis
Low-income: 91.1% vs High-income: 85.9% activity accuracy. A 5.26 pp gap driven by constrained life routines.
Deep models expose high-income complexity
Transformer outperforms SGD by +3.16 pp for high-income only. Low-income lift <0.03 pp.
Transition moments: universal blind spots
Error peaks at 16% at 08:00 and 14% at 18:00—exactly at work and domestic transitions. Night hours: <1% error.
Sequence order dominates all features
Order-preserved vs bag-of-words: 14–20 pp gap. Removing order collapses performance more than removing all static demographics combined.
External validity survives heterogeneity
In MTUS full runs, Transformer stays above persistence in 7/7 countries with a weighted +0.44 pp gain. Fine activity carries more signal than coarse coding.
Four channels, one framework
Test accuracy across 13 models. Each 10-minute slot predicted from preceding window (W=1, optimal).
| Model | Activity | Enjoyment | Location | With Whom |
|---|
When does the model fail?
Errors concentrate at behavioural transitions — morning routines (08:00), lunch (12:00), and the work-to-home shift (17–18:00). Night hours approach 0% error.
How does a day unfold?
Each column is one hour. Each row is a demographic profile. Colour shows the dominant activity at that hour — revealing how structured constraint shapes daily time.
How much is true prediction, and how much is inertia?
Across UK and US runs, a simple persistence rule (“next slot = previous slot”) is extremely strong. This reframes the headline: high accuracy often reflects behavioural inertia rather than rich anticipatory forecasting.
Activity: SGD under persistence
SGD trails a one-step copy baseline by 2.28 pp on US activity forecasting.
Location: model slightly helps
For coarse and sticky channels like location, model features can still add a small gain.
Predictability as social constraint
If a group is highly predictable mainly because today repeats yesterday, this reflects institutionalized time: fixed work schedules, constrained options, and role-bound routines.
Income group accuracy: non-overlapping CIs confirm significance
The income gap replicates across two countries
UKDA-8741 (UK, 2014–2015) and ATUS 2024 (US) show structurally identical patterns. Income stratification is a durable, cross-national social fact.
US expansion: all experiments complete
Phase 2 report finalized: 389 lines, 11 chapters, 18 key findings, 127 data table rows, and an 8-figure visualization appendix. It covers ATUS 2024 and pooled 2003–2024 cross-year analysis and is now fully advisor-ready.
US loader stabilization
Pooled + 2024 schema compatibility validated. Pipeline is production-ready for iterative batches.
US baseline + grouped activity
Activity forecasting by income / employment / sex complete. Transformer consistently beats SGD across all US subgroups.
Fine vs coarse activity
Coarse grouping lifts model accuracy from 87.72% to 90.11% (+2.4 pp) — but fine grain captures genuine behavioural complexity.
10-min vs 30-min timescale
30-min resampling loses 14 pp accuracy vs 10-min. Finer resolution is decisively better; behavioural signal degrades rapidly with temporal aggregation.
Weekday-only proxy test
Weekday-only sampling loses only 0.8 pp vs full-week. Model generalises robustly — structural patterns dominate over day-type variation.
Cross-national feasibility
MTUS onboarding and country comparability diagnostics were completed here, then carried forward into the full 7-country validation delivered in Phase 3.
Full-scale A1 runs (SGD + Transformer)
Full-sample SGD and Transformer complete. Transformer reaches persistence ceiling on activity. Both models validated across all channels.
Pooled 2003–2024 ATUS analysis
20-year pooled analysis confirms temporal stability. Behavioural predictability is a durable social fact — not an artefact of recent data.
Pooled data: model beats persistence
With 140,000+ training windows, Transformer exceeds the persistence baseline by +0.2–0.5 pp across income, age, and weekday groups. The apparent “unbeatable persistence” in small samples is a methodological artefact of underfitting.
Persistence ceiling identical across UK & US
Both UK (UKDA-8741) and US (ATUS) yield 88.76% persistence accuracy on activity. Short-term temporal inertia is a universal human feature, independent of institutional context.
What the US data confirms — and what replicates across UK & US
MTUS full runs turn LifeCast into a cross-national result
Phase 3 is no longer a quick-status page. The refreshed summaries now include all 42 A1 fine records and 105 B1 grouped full runs across CA, ES, FR, IT, KR, NL, and ZA, supported by a dedicated 10-figure chain and a complete technical report.
Mainline decision
Keep: fine activity + Transformer as the formal default. In the refreshed full summary, Transformer is positive in all 7 countries.
Interpretation: persistence remains strong, but it is not the final ceiling once harmonization and sample size are handled carefully.
Grouped robustness
Keep: age_bin and sex as the stable Phase 3 grouped lanes. Both remain above zero in the full runs.
Why it matters: model lift is not confined to pooled totals; it survives social slices as well.
Fine vs coarse
Fine: weighted delta +0.44 pp. Coarse: +0.16 pp.
Coarse remains a robustness lane, but fine activity carries the richer signal and therefore stays on the main narrative path.
Transformer stays positive in all seven countries
Formal runs amplify the weak quick signals
age_bin and sex stay positive country by country
中文:Phase 3 现在已经不是 quick 通报,而是完整的跨国外部效度结果:7 国 full runs、42 条 A1 fine 记录、105 条 B1 分组 full records 全部入表,且 Transformer 在 7 国均保持正增益。
EN: Phase 3 is now a full external-validity result, not a quick note: all 42 A1 fine records and 105 grouped full records are included, and Transformer remains positive in all seven MTUS countries.
Predictability maps social structure
Accuracy gaps between demographic groups are social science findings. Bootstrap 95% CIs confirm all differences are significant.
Predict a life
Choose a demographic profile. The model forecasts a complete 24-hour day — slot by slot, the same way the Transformer does it.
2,000 simulated lives — a city in motion
Each dot is one simulated life. Coloured zones are city districts. People flow between districts as the day progresses — driven by the same structural patterns our model learns to predict.
Framework, harmonization, and handoff
Data pipeline
UKDA-8741 Stata .dta → wide-to-long → sliding windows. Person-level 70/15/15 split by mainid. No cross-person leakage.
Sliding windows
W consecutive slots → predict slot W+1. W=1 (10 min) optimal. ~914K samples total across four channels.
13 models, then focused formal tracks
Phase 1 surveys the full model family. Later phases narrow the formal comparison to SGD, Transformer, and persistence-style baselines where the methodological question becomes most interpretable.
Evaluation
Accuracy, Macro-F1, delta vs persistence, weighted aggregation, MAE, QWK, and Bootstrap 95% CI. The project increasingly treats persistence as the benchmark that results must explain, not ignore.
Group experiments
Per-group train/eval spans income, econstat, sex, age, weekday structure, and later MTUS grouped slices. The grouped question stays constant across phases: who is more predictable, and why?
Ablation design
Order vs bag encoding, window length (W=1–30), fine vs coarse activity, weekday/full-week contrasts, and cross-channel inputs isolate where the predictive signal actually lives.
Cross-national harmonization
Phase 2 and Phase 3 align UKDA-8741, ATUS, and MTUS to a shared activity forecasting task. Fine activity stays on the mainline; coarse coding remains a robustness lane rather than the headline result.
Report packaging
Each mature layer is delivered as Markdown, self-contained HTML, and print-ready PDF. The site now carries both a current Phase 4+ package and a legacy Phase 1–3 archive, so the reading surface stays aligned with live results while older packages remain recoverable.
Current Phase 4+ package, archive reports, and raw figures are assembled
The project now has two public report surfaces: a current Phase 4+ delivery package with its own overview, complete report, and visualization atlas, plus the earlier Phase 1–3 archive with the older overview and legacy phase reports. This is the curated reporting layer built on top of 921 result JSON artifacts rather than the full raw tree dumped verbatim.
From predictive metrics to social theory
This project interprets forecasting performance through sociology of time, institutional constraint, and stratified agency.
Pred, A. (1981/2005 reprint). Time-geography and the social anchoring of everyday practices. Used here for the concept of daily routines as space-time constraints.
Weber, M. (1905; 1922). Rationalization and the “iron cage”. Used to interpret fixed work-time systems as institutionalized temporal discipline.
Working Time Mismatch and Employee Subjective Well-being across Institutional Contexts (job-quality perspective). Supports the institutional-context lens for cross-national expansion.
Hochschild, A. (1989). The second shift. Supports gendered role-load interpretation for fragmented daily schedules.
Peterson, R. (1992). Cultural omnivore thesis. Inspires the hypothesis of “omnivore schedules” in higher-income groups.
Method note. At this phase, missing-value handling follows advisor guidance: drop-first strategy for core variables before imputation extensions.