No. 02 — Personal Data Essay

A life, listened to.

Five thousand tracks, fifteen months of streaming data, and the surprisingly steady rhythm of an ordinary week.

Data ScienceMay 2025Solo

Spotify knows more about my days than most of my friends do. Every commute, every late-night coding session, every dull afternoon at a desk leaves a small record — a track, a timestamp, a device. One weekend I asked for all of it back. The GDPR export arrived as a ZIP file a few days later: plain JSON, surprisingly raw, and much larger than I expected.

What follows is a quiet experiment in self-surveillance — a look at whether fifteen months of playback data could tell me anything about myself that I didn’t already know.

5,000+Tracks Analyzed
1,000+Sessions Identified
15Months of DataMay 2024 – May 2025
6Platforms Detected
I.The Question

The impulse was simple. I know what I listen to — I have preferences, I make playlists, I can recite my top artists. But the more interesting question was whether my behavior had a shape. Not what I played, but when, and how, and with what kind of consistency.

Do I actually have a “morning playlist,” or is that a story I tell myself? Is late-night listening different in kind from afternoon listening, or just in hour? When I come back to an artist, is it curiosity or comfort?

II.The Method

I pulled my full export — every track, every skip, every platform tag — and enriched each row with the hour, weekday, and session index it belonged to. A session was defined as any sequence of plays without a gap longer than thirty minutes. Simple, but sturdy enough to surface patterns.

The plan had been to layer Spotify’s audio features on top — valence, tempo, danceability. Those endpoints were deprecated a few weeks before I started. What remained was temporal and contextual: when I played something, on what, and how longI stayed.

That turned out to be more than enough.

III.What I Found

Three patterns held up under every way I tried to break them.

The week has a signature. Weekday mornings and late weekday evenings were the two reliable peaks. Mid-afternoon, between three and five, listening collapsed — the hours I’m apparently too busy to press play. Saturdays looked almost nothing like weekdays: later start, longer sessions, more variance.

Devices tell a second story. Mobile listening dominated commutes and weekends; desktop took over during work hours. Platform share was a cleaner proxy for what I was doing than the music itself.

Repeat behavior is bimodal. Most tracks I played once or twice. A small core — maybe fifty songs — I played dozens of times, almost exclusively during focused work. Comfort listening and discovery listening barely overlap.

The most consistent finding wasn’t a preference. It was a rhythm — a schedule more reliable than any calendar I actually keep.
IV.Limitations

Every finding here describes one person — me. The sample size is fifteen months of a single listener, which is enough to see habits but not enough to generalize. With audio features deprecated, mood and energy remain inferred rather than measured; all the claims I make about “focus listening” or “late-night listening” are behavioral proxies, not acoustic ones.

The session heuristic is also a choice. A thirty-minute gap is defensible but not definitive — a shorter threshold would have multiplied the session count and likely muted the weekend pattern.

V.What I'd Do Differently

The analysis ends where a better version would begin. Fifteen months is a snapshot; I wanted continuous data. GDPR exports are one-off and slow — by the time you look at them, they’re already stale.

So I built the thing that should have existed first: a pipeline that pulls my listens automatically, stores them in Postgres, and keeps the table fresh. That project lives next door.

Colophon

Written and analyzed in Python with Pandas for wrangling, Plotly for the figures, and a light touch of Scikit-learn for session clustering. Raw data drawn from Spotify’s GDPR personal export.