Slidesmart

summarizes large videos and texts into slides

Demo Video: here

Inspiration💡

The fast-paced education system today burdens educators with hours of work. Researching, teaching, designing courses, marking papers takes up most of their time. Providing them with a useful tool to assist their presentations is crucial.

We thus ended up building Slidesmart through which educators can leverage the power of being able to encapsulate long videos and texts such as blog posts and research papers into crisp, accurate, and visually appealing summaries in the form of slides.

What it does🔎

Slidesmart is a web app that holds promise for educators.

  • For hour-long videos or audios: Slidesmart audio to slides uses machine learning algorithms and APIs to deliver creative and concise slides based on audio recordings for your presentation, group project, or lecture. Slidesmart takes in the YouTube video link and runs it through the Machine Learning model we created to identify the important concepts being discussed and presents a brief summary to the user.

  • For lengthy notes: Slidesmart text to slides uses state-of-the-art machine learning algorithms and APIs to curate customized slides using your presentation’s script. Slidesmart intakes .tex files of notes, essays or articles, which are then parsed through our Machine Learning model and instantly displays slides summarising the entire content.

How we built it🔨

We majorly divide this project into two sub-projects:

Slidesmart Machine Learning Model

Throughout the course of this project we made use of 2 different Machine Learning models for different use cases:

  1. Audio to slides

We use the Assembly AI APIs to first summarize the audio and extract key phrases from an audio clip. We then take the key phrases extracted from the audio clip and create images from the text which could help in better explaining the content on the slides.

  1. Text to slides

We use the Azure Text Analytics APIs to first summarize the text and extract key phrases from an audio clip. We then take the key phrases extracted from the audio clip and create images from the text which could help in better explaining the content on the slides.

The text-to-image machine learning model is also an aspect where we put in a lot of work. We first thought to make use of Stable Diffusion which proved to be very time-consuming per image even with a GPU backend as we expected. We tried using some other vanilla transformer and diffusion models as well. Finally, we ended up modifying a stable diffusion model taking inspiration from HuggingFace’s Diffusers and Transformers libraries to run quickly, this did involve us losing some of the generative power of Diffusion Models but we were able to generate images very quickly. Having ended up with quite a large model, which we spent time training on and understanding the quirks for, optimizing this model was of the essence. We were able to introduce some optimizations to the model.

We made heavy use of the Google Cloud AI platform to experiment and train with the models.

Finally, we also decided to perform the inference for the model using WebAssembly (Wasm) and serialized the WebAssembly module which paired with some other optimization we make greatly improved the model performance. Overall we ended uo using multiple Machine Learning models and APIs all of which were pretty well optimized for the task at hand.

Slidesmart Web App

The Slidesmart Web App was built using Django framework which simplified for us the labyrinth of managing data-heavy applications like ours, state management for a large model, and more. The web app written in Python allowed us to introduce optimizations easily for the data and memory-intensive tasks we want to be able to run for this application. HTML and CSS helped us make our web pages more visually attractive and interactive. We used Neumorphism design styles to create a very user-friendly and minimalistic UI for the students.

Challenges we ran into⚠️

  • Our first pass at integrating the model into the web application was quite bulky as well as took some time to infer from. Optimizing the model and finding or modifying the optimization techniques that work for our use cases was one of the big challenges.
  • Converting summaries to creative slides was a very time taking task we were able to complete by modifying a stable diffusion model taking inspiration from HuggingFace’s Diffusers and Transformers libraries
  • We were using multiple Machine Learning models for our project all of which tie together pretty well but it can be quite bulky in runtime so we spent quite some time optimizing the inference process and also used WebAssembly to run the text to image model

Accomplishments that we’re proud of🥇

  • Our first pass at integrating the model into the web application took some time to run the inferences from. Optimizing the model and finding or modifying the optimization techniques that work for our use cases was one of the big challenges. Integrating Wasm in this environment proved to be a quite difficult task however well worth the performance gains.
  • Our minimalistic neumorphic UI design compliments our work with AI models
  • We are quite happy that we were able to integrate the models with the UI and do so efficiently
  • We were also able to teach each other in this hackathon, some of us had no experience with Machine Learning while some were not able to make great UIs

What we learned🧠

  • Most of our project members were not experienced with techniques in the area of Machine Learning or modern machine learning models like the one we used.
  • All our project members are first-year students at the University of Toronto, but our combined experience helped us complete this project in the allotted time. Our belief in teamwork also increased during the course of this hackathon.
  • We learned more about optimizing Machine Learning and being able to generate a WebAssembly module to run a mini Stable Diffusion might just be a full blown project we create!

What’s next for Slide Smart💭

  • Our slides follow a particular elegant style, giving the user more personalized and customizable slide templates will uplift the visual appeal of our project.
  • Allowing educators to personally handpick important topics for more emphasis is one the goals we hope to achieve in the near future.

Contributors:

  • Shivesh Prakash
  • Rishit Dagli
  • Sai Manish
  • Alex Rosen

Check it out here: GitHub Repo