Paul Urquhart

Projects

Dow Jones Industrial Average - End-to-End Data Pipeline

Batch data pipeline ingesting market data from Yahoo Finance into Databricks, transforming it using Spark (bronze/silver/gold), and serving results via a public Streamlit dashboard.

Databricks Pipeline (GitHub)
VPS serving layer (GitHub)
Live Dashboard

Tech Stack

Python - data ingestion, transformations, and application logic
SQL (Spark SQL) - de-duplication, incremental merges, and analytical queries
Apache Spark (PySpark) - bronze/silver/gold transformations in Databricks
Databricks (Free Edition) - notebook-based data processing and table storage
API Ingestion (Yahoo Finance) - market data ingestion
Apache Airflow - daily orchestration of data transfer tasks
Bash - automated data retrieval via Databricks REST API
Streamlit - lightweight data dashboard
Plotly - interactive OHLC and time-series visualisation
Linux (VPS) - application hosting and scheduling
Git & GitHub - version control and project separation