Flight Delay Prediction Lab Guide (Microsoft Fabric)

This guide provides step-by-step instructions on using Copilot and other AI tools within Microsoft Fabric to build an end-to-end flight delay prediction solution. Visitors can follow these instructions to learn how to effectively leverage AI capabilities in Microsoft Fabric.

Repository: fabric-flight-ai-demo


Step 0: Create Workspace

First, you need to create a Microsoft Fabric workspace:

  1. Navigate to workspaces and click “New workspace”:

Workspaces List

  1. Fill in workspace details:

Create Workspace Dialog

  1. Create Lakehouse:

Step 1: Select Lakehouse from New Item

Select Lakehouse

Step 2: Name the Lakehouse

Name Lakehouse


Step 1: Upload CSV to Lakehouse

Step 1: Navigate to Files and Upload

Upload Files Menu

Step 2: Upload the CSV File

Upload Files Dialog


Step 2: Dataflow Gen 2 – Clean & Transform

Note: This step can be skipped. You can now convert CSV files directly to tables using the Load to Tables functionality. Simply right-click on the uploaded file in your Lakehouse and select Load to Tables > New table.

Create Dataflow:

Step 1: Create New Dataflow Gen2

Create Dataflow Gen2

Step 2: Rename the Dataflow

Rename Dataflow

Step-by-step using Copilot:

  1. Get data from Lakehouse file:

Step 1: Get Data from Lakehouse

Get Data Lakehouse

Step 2: Create Connection to Lakehouse

Create Connection

Step 3: Choose the CSV File

Choose File

Load the CSV file from Lakehouse named `flights_sample_3m.csv`
  1. Remove null ARR_DELAY rows:

This step demonstrates how Copilot can be used to clean data before analysis — it is an example of what’s possible when preparing datasets for ML workflows in Fabric.

Remove all rows where the value in the column `ARR_DELAY` is null.
  1. Create column IS_DELAYED:
Create a new column `IS_DELAYED`.
Set its value to 1 if `ARR_DELAY` > 15, otherwise 0.
  1. Create column DEP_HOUR:
Create a column `DEP_HOUR` by extracting hour from `CRS_DEP_TIME` (e.g. 1530 → 15).
  1. Create column FL_DAYOFWEEK:
Extract day of the week from `FL_DATE` and store in `FL_DAYOFWEEK`. Monday = 1.
  1. Create column FL_MONTH:
Extract month from `FL_DATE` and store in `FL_MONTH`. Use Date.Month([FL_DATE]).

Save Output:

The Warehouse is chosen for storing processed data to facilitate efficient data retrieval and seamless integration with Power BI for analytics and reporting purposes.


Step 3: Data Agent

Create Data Agent:

Step 1: Create New Data Agent

Create Data Agent

Step 2: Define the Name for Data Agent

Define Data Agent Name

Step 3: Connect Data Agent to Lakehouse

Connect to Lakehouse

Step 4: Enable the Table for Data Agent

Enable Table

Example Queries to Try:

Simple Queries:

More Complex Queries:

Example Response from Data Agent:

Data Agent Response

The Data Agent can answer natural language questions about your flight data and provides:

This demonstrates how the Data Agent translates business questions into SQL queries and returns actionable insights from your flight delay dataset.


Step 4: Notebooks – AI Modeling

Create Notebook:

Prompt 1 – Load and Prepare Data

Load the `flightdelay-features` table into a Spark DataFrame.
Prepare the data for binary classification on `IS_DELAYED`.
Apply cleaning and encoding automatically.

Prompt 2 – Train Model

Train a binary classification model to predict `IS_DELAYED`.
Split into train/test sets, fit the model, and evaluate its performance.

Prompt 3 – Visualize Results

During this step, you will discover which features most influence flight delays. For example, in the test scenario, the hour of departure (DEP_HOUR) had the strongest predictive power — flights later in the day are generally more prone to delays. In contrast, features like distance or specific origin/destination airports had much less influence.

This helps demonstrate how machine learning can uncover non-obvious patterns in flight data and guide operational improvements or forecasting strategies.

Show feature importance (bar chart), a confusion matrix (heatmap), and delay rate by `DEP_HOUR` (line chart).

Prompt 4 – Predict New Flight

Create a DataFrame with a flight:
- DEP_HOUR = 18
- FL_DAYOFWEEK = 5
- FL_MONTH = 12
- AIRLINE_CODE = "UA"
- ORIGIN = "ORD"
- DEST = "LGA"
- DISTANCE = 733

Apply preprocessing, predict `IS_DELAYED`, and print:
- Class (0 or 1)
- Probability
- Message ("likely to be delayed" or not)
- Confidence %

Step 5: Power BI – Dashboard

Create Semantic Model:

The semantic model provides a structured layer over your data to simplify building Power BI reports. It helps streamline data access, organize fields for analysis, and enable self-service reporting experiences.

Auto-Create Report:

Step 1: Access Auto-Create Report Option

Create Auto Report

Step 2: Review Generated Report

Auto Report Example


What you can learn from this lab

All these steps can be conducted without writing a single line of code — this lab shows how you can use Copilot to streamline your analytical work and boost productivity.

This lab showcases how Microsoft Fabric Copilot helps reduce friction across the full data-to-insight workflow.


🛠️ To-do

Notebook & Automation:

Semantic Model:

Power BI Copilot:

Power BI MCP (Model Context Protocol):


I will be happy to hear your feedback or answer any questions. You can contact me via LinkedIn: aka.ms/taras.