Into the ReviewVerse

Summarises thousands of reviews into useful insights

Demo Video: here

Inspiration💡

The inspiration for a product review website arose from the frustration of people shopping online. We often find it difficult to make informed decisions due to the overwhelming number of products and unreliable reviews available. This led us to envision a centralized platform that would provide comprehensive and unbiased information to help consumers like ourselves. Through research and understanding consumer preferences, we realized the importance of transparency, credibility, and user-generated content. We also drew inspiration from successful review websites and recognized the impact of social media on consumer opinions. The goal was to make a review website which would help people make better decisions with the help of all the reviews given on that product. With a dedicated team, we set out to build a reliable and valuable resource for shoppers worldwide, fostering informed decision-making and meaningful interactions.

What it does🔎

We have 3 pages. The first page is the home page where we have a demo video on youtube on how to operate this website. Second page is an about page where the information of the creators is given and the third page is the task page where we take a website’s url as input and it has one button beside the input box. Once the search button is pressed the website shows an output of what the products pros and cons are.

How we built it🔨

We majorly divide this project into two sub-projects:

Into the ReviewVerse Machine Learning Model

We very quickly found out that the goals for our model were very different than that of a traditional summarizing model or a text generation model. Thus, one of the thoughts we had was potentially trying to fine-tune any existing large language model. Thus we work with instruct finetuning LLaMA using LoRA. We did face quite a few implementation challenges in curating the correct dataset for few-shot finetuning such that the model does not get overfit on new data as well as finetuning a 7 billion parameter model in a short time. We finetuned LLaMA to make it understand the tasks of identifying pros and cons from reviews and also specifically constructed representative examples to identify fake-sounding reviews as well. The model then makes a bullet point list of very specific pros and cons from analyzing these reviews.

We ended up with quite a large model and thus adopted 4-bit quantization strategies mainly implementing GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.

Having still ended up with quite a large model, which we spent time training on and understanding the quirks for, optimizing this model was of the essence. We were able to introduce some optimizations to the model. Finally, we also decided to perform the inference for the model using WebAssembly (Wasm) and serialized the WebAssembly module which paired with some other optimization we make greatly improved the model performance.

Into the ReviewVerse Web App

The Into the ReviewVerse web app is powered by the Flask framework, utilizing HTTPS, Javascript, Bootstrap 4.0 and CSS. Our homepage features an embedded demo video that serves as a helpful guide for first-time users to navigate our web app seamlessly.Leveraging Flask has allowed us to harness the versatility of Python while seamlessly integrating the sleekness of CSS and the interactivity of Javascript, resulting in an intuitive and engaging user experience.

Challenges we ran into⚠️

  • Hardware challenges of fine-tuning such a huge model in a short time, even though LoRA and PeFT makes it rather easy it was a bit of a challenge figuring out the right hyperparameters
  • Being able to run the model in a usable way was also a quite big challenge for us, implementing quantization strategies
  • Familiarizing ourselves with the Flask framework proved to be a significant challenge as all project members were new to it. We invested substantial time in learning about the framework and often relied on resources like Stack Overflow to troubleshoot any encountered errors.
  • Web scraping reviews presented its own set of challenges. Extracting data from thousands of lines of HTML code demanded careful navigation and understanding. Furthermore, identifying websites that permitted web scraping added an additional layer of complexity to the process.

Accomplishments that we’re proud of🥇

  • One of our significant accomplishments is successfully integrating the machine learning models with the user interface (UI) in an efficient manner. This achievement highlights our ability to combine the technical aspects of machine learning with the practicality of a user-friendly interface.
  • We take pride in our collaborative learning environment during this hackathon. Despite varying levels of experience in machine learning and UI development, we effectively shared knowledge and expertise among project members. This mutual teaching and learning process contributed to our collective growth and success.

What we learned🧠

  • Despite being first-year students at the University of Toronto, we leveraged our collective experience and skills to successfully complete the project within the given timeframe. Our belief in the power of teamwork was reinforced throughout the hackathon.
  • This hackathon also provided us with the opportunity to delve into the Flask framework, expanding our knowledge and practical understanding of its capabilities.

What’s next for Into the ReviewVerse💭

  • In this project, we only did it for the shopping website Etsy, in the future we plan on doing it for all the shopping websites which will really make the process of selection better for the consumers.
  • We still do need to improve the UI before releasing it to the public and we also need to get certain API permissions from the shopping websites like Amazon, Walmart, and Shein.

Contributors:

  • Shivesh Prakash
  • Rishit Dagli
  • Sai Manish
  • Bhavya Bhatt

Check it out here: GitHub Repo