Cory Trast — Music Pipeline Showcase

// What this pipeline does

Four pipelines. One framework.

📱💻

Stream Pipeline — Phone & PC

A single reusable Lambda function handles both phone and PC streaming events. Each source has its own Kinesis Data Stream. The Lambda reads its input and output schemas from AWS Parameter Store at runtime — no hardcoded field names, no code changes to add a new stream source.

Raw JSON lands in S3 raw/, triggers a second Lambda to update DynamoDB in real time, then Glue enriches the data — resolving lat/long to city and state, and translating song, artist, album, and playlist IDs to human-readable names via RDS lookups and DynamoDB user profile lookups.

👔👥

Batch Pipeline — Employees & Customers

Employee and customer records live in RDS as fully normalized relational tables. A config-driven Lambda batch extractor reads the schemas from Parameter Store, extracts records, and writes them to S3 raw/.

Glue then flattens the relational structure — joining across address, phone, position, salary, and bonus tables — producing analytics-ready flat records in S3 curated/, which are then loaded into DynamoDB. Sensitive fields (SSN, CC numbers) are KMS-encrypted and excluded from all pipeline output schemas.

🔧

What's being simulated

This showcase uses synthetic data to demonstrate the pipeline. The Lambda producers generate realistic records — phone events with lat/long coordinates, PC events with browser types, employee records with relational lookups, and customer records with enrollment history.

The pipeline architecture, Parameter Store config pattern, Glue enrichment logic, and DynamoDB state management are all production-grade — the data source is simulated so the framework can be demonstrated without live RDS infrastructure cost.

⚙️

Config-Driven Architecture

Input and output schemas are stored as JSON in AWS Parameter Store — versioned, audited, and encrypted independently of code. The Lambda reads its schema path from an environment variable, loads the config at runtime, validates incoming records, and routes output — all without a single hardcoded field name.

Adding a new data source = add a Parameter Store entry + a new Kinesis trigger. Zero Lambda code changes.

📊

Data Science Use Case

The curated S3 output answers a real business question: "What is the #1 song by region, age, and city?" — by enriching each raw streaming event with resolved song name, artist, city, state, and user age at Glue transform time. The curated dataset is analytics-ready for ML models, BI tools, or direct Athena queries without any further joins.

// AWS Parameter Store

Config as infrastructure

Input and output schemas live in Parameter Store — not in code. The Lambda resolves its schema path from an environment variable, caches the config on warm invocations, and validates every record against it at runtime.

// /music-pipeline/phone/input-schema

"user_id": "string", "lat": "float", "long": "float", "album_id": "string", "record_id": "string", "artist_id": "string", "datetime": "timestamp", "playlist_id":"string", "device_type":"string"

// /music-pipeline/phone/output-schema

"user_id": "string", "city": "string", "state": "string", "song_name": "string", "artist_name": "string", "album_name": "string", "playlist_name":"string", "user_age": "int", "datetime": "timestamp", "device_type": "string"

// /music-pipeline/pc/input-schema

"user_id": "string", "lat": null, "long": null, "album_id": "string", "record_id": "string", "artist_id": "string", "datetime": "timestamp", "playlist_id": "string", "browser_type": "string"

// /music-pipeline/batch/employees/output-schema

"emp_id": "string", "fname": "string", "lname": "string", "title": "string", "city": "string", "state": "string", "salary": "float", "bonus": "float", "cost_center": "string" // ssn excluded — KMS encrypted

// Interactive Pipeline

Run the pipelines

Trigger individual pipelines or start all four simultaneously. Watch records flow through each stage in real time.

PHONE STREAM

Idle

0 records

PC STREAM

Idle

0 records

BATCH EMPLOYEES

Idle

0 records

BATCH CUSTOMERS

Idle

0 records

// LIVE DATA FLOW

PHONE STREAM

📱

Producer

Lambda

→

📡

Kinesis

Stream

→

Ingestor

SSM config

→

🪣

S3 raw/

JSON

→

🗄️

DynamoDB

State

→

🔧

Glue

Enrich

→

🪣

S3 curated/

Parquet

PC STREAM

💻

Producer

Lambda

→

📡

Kinesis

Stream

→

Ingestor

SSM config

→

🪣

S3 raw/

JSON

→

🗄️

DynamoDB

State

→

🔧

Glue

Enrich

→

🪣

S3 curated/

Parquet

BATCH EMP

🗃️

RDS

Employees

→

Extractor

SSM config

→

🪣

S3 raw/

JSON

→

🔧

Glue

Flatten

→

🪣

S3 curated/

Parquet

→

🗄️

DynamoDB

Store

BATCH CUST

🗃️

RDS

Customers

→

Extractor

SSM config

→

🪣

S3 raw/

JSON

→

🔧

Glue

Flatten

→

🪣

S3 curated/

Parquet

→

🗄️

DynamoDB

Store

Music Pipeline

Four pipelines. One framework.

Stream Pipeline — Phone & PC

Batch Pipeline — Employees & Customers

What's being simulated

Config-Driven Architecture

Data Science Use Case

Config as infrastructure

Run the pipelines

Live data grids