Paul Urquhart
Projects
Dow Jones Industrial Average - End-to-End Data Pipeline
Batch data pipeline ingesting market data from Yahoo Finance into Databricks,
transforming it using Spark (bronze/silver/gold), and serving results via a
public Streamlit dashboard.
Databricks Pipeline (GitHub)
VPS serving layer (GitHub)
Live Dashboard
Tech Stack
- Python - data ingestion, transformations, and application logic
- SQL (Spark SQL) - de-duplication, incremental merges, and analytical queries
- Apache Spark (PySpark) - bronze/silver/gold transformations in Databricks
- Databricks (Free Edition) - notebook-based data processing and table storage
- API Ingestion (Yahoo Finance) - market data ingestion
- Apache Airflow - daily orchestration of data transfer tasks
- Bash - automated data retrieval via Databricks REST API
- Streamlit - lightweight data dashboard
- Plotly - interactive OHLC and time-series visualisation
- Linux (VPS) - application hosting and scheduling
- Git & GitHub - version control and project separation