Formula 1 Driver Performance Analysis

Situation

The goal of this project was to analyze Formula 1 using data analysis tools. The project was conducted as a 2021 Formula 1 season analysis but also includes an overall analysis of the sport and a live race as well.

Task

The project aims to analyze how drivers perform on different circuits, compare teammates’ performance across seasons, and track constructors’ points over time. We also explore factors like lap times of teams and drivers, circuits and the corners of every circuit, corner speeds of drivers in every circuit, and tire impact. In our pursuit of unraveling Formula 1’s secrets, we shed light on the success stories of both teams and individual drivers. Exploring the global representation within the sport, we examine the triumphs of nationalities, offering a nuanced perspective on the diverse tapestry that constitutes Formula 1. This comprehensive project aims to highlight various nuances of Formula 1, employing a combination of data visualization techniques using matplotlib, seaborn, plotly, Altair to make interactive dashboards to uncover the crux behind the patterns and trends that define Formula 1.

Action

I was in a team of 2 for my data visualization project and found F1 to be one of the interesting ideas for visualization and storytelling. So we began by collecting data on driver performance during the season, including lap times, qualifying positions, and race results using a pythin package called fastf1. We then used Python to perform some EDA and clean and process this data, calculating various metrics such as average lap time, positions in a race etc.

Here is an example of the Python code to show qualifying results in every GP:

quali_df = {}
for i in range(2018,2024):
    quali_df[i] = {}
    for t in fastf1.get_event_schedule(i).EventName:
        if t.endswith('Prix'):
            quali_df[i][t] ={}
            session = fastf1.get_session(i, t, 'Q')
            session.load(telemetry=True, weather=False)
            drivers = pd.unique(session.laps['Driver'])
            list_fastest_laps = list()
            for drv in drivers:
                drvs_fastest_lap = session.laps.pick_driver(drv).pick_fastest()
                list_fastest_laps.append(drvs_fastest_lap)
            fastest_laps = Laps(list_fastest_laps).sort_values(by='LapTime').reset_index(drop=True)
            pole_lap = fastest_laps.pick_fastest()
            fastest_laps['LapTimeDelta'] = fastest_laps['LapTime'] - pole_lap['LapTime']
            quali_df[i][t]['pole_lap'] = pole_lap
            quali_df[i][t]['fastest_laps'] = fastest_laps

def quali(year, name_track):
    try:
        plt.clf()
        unique_drivers = quali_df[year][name_track]['fastest_laps']['Driver'].unique()
        driver_colors = plt.cm.viridis(np.linspace(0, 1, len(unique_drivers)))
        driver_color_dict = dict(zip(unique_drivers, driver_colors))
        fig, ax = plt.subplots()
        colors = [driver_color_dict[driver] for driver in quali_df[year][name_track]['fastest_laps']['Driver']]
        ax.barh(quali_df[year][name_track]['fastest_laps'].index, quali_df[year][name_track]['fastest_laps']['LapTimeDelta'],
                color= colors,edgecolor='grey')
        ax.set_yticks(quali_df[year][name_track]['fastest_laps'].index)
        ax.set_yticklabels(quali_df[year][name_track]['fastest_laps']['Driver'])
        ax.invert_yaxis()
        ax.set_axisbelow(True)
        ax.set_xlabel('Time to Leader(seconds)')
        ax.set_ylabel('Drivers')
        ax.xaxis.grid(True, which='major', linestyle='--', color='black', zorder=-1000)
        lap_time_string = strftimedelta(quali_df[year][name_track]['pole_lap']['LapTime'], '%m:%s.%ms')
        plt.suptitle(f"{name_track} {year} Qualifying\\n"
                    f"Fastest Lap: {lap_time_string} ({quali_df[year][name_track]['pole_lap']['Driver']})")

        plt.show()
    except:
        print('Data not available for this year in this Grand Prix')

unique_track_name_quali =  list()
for i in quali_df.values():
    unique_track_name_quali.extend(list(i.keys()))
unique_track_name_quali = list(set(unique_track_name_quali))    

unique_years_quali=  [2018,2019,2020,2021,2022,2023]

track_name_dropdown_quali = Dropdown(options=unique_track_name_quali, description = 'Circuit:', value =  'Abu Dhabi Grand Prix')
year_dropdown_quali = Dropdown(options=unique_years_quali, description = 'Season:',value =2021)
interact(quali, year = year_dropdown_quali, name_track=track_name_dropdown_quali)

Another useful interactive plot for position over lap time for every lap was also created using altair. Here is the code snippet for position over time in every season and every GP from 2018-2023.

sessions_df = {}
for s in range(2018,2024):
    lap_df={}
    for t in fastf1.get_event_schedule(s).EventName:
        if t.endswith('Prix'):
            session = fastf1.get_session(s, t, 'R')
            session.load(telemetry=True, weather=False)
            lap_df[t] = session.laps[['Driver','Position','LapNumber']]
    sessions_df[s]=lap_df
tidy_data = pd.concat(
    [df.assign(Year=year, Track=track) for year, tracks in sessions_df.items() for track, df in tracks.items()]
)
tidy_data.dropna(subset=['Position'],inplace=True)
alt.data_transformers.enable("vegafusion")

base = alt.Chart(tidy_data).mark_line().encode(
    x='LapNumber:O',
    y='Position:O',
    color='Driver:N',
    tooltip=['Driver:N', 'Position:Q', 'LapNumber:O']
).properties(
    width=800,
    height=500
).interactive()

year_dropdown = alt.binding_select(options=sorted(list(tidy_data.Year.unique())))
year_selector = alt.selection_point(fields=['Year'], bind=year_dropdown, name='Select_',value=2021)

track_dropdown = alt.binding_select(options=list(tidy_data.Track.unique()))
track_selector = alt.selection_point(fields=['Track'], bind=track_dropdown, name='Select',value='Abu Dhabi Grand Prix')

chart = base.add_params(year_selector,track_selector).transform_filter(
    year_selector).transform_filter(track_selector
).properties(
    title='Formula 1 Position Over Laps'
)

chart

Result

Our analysis revealed several insights into driver performance, and constructors performance including the identification of consistently fast drivers over the years and drivers who were prone to errors.

Some results and outcomes of this analysis:

  1. Analysed the driver’s performance on different circuits

  2. Analysed driver’s performance over a season against their teammates

  3. Analysed the constructor points over a period of time

  4. Analysed the Lap of every race

  5. Provided an overview of Qualifying