10 Exciting Projects on Large Language Models(LLM)

Aayush Tyagi 27 Jun, 2023 • 7 min read

Hey job seekers! Want to get noticed? Share your work with potential employers. Especially if you’re in software development or data science. A portfolio of your projects, blog posts, and open-source contributions can set you apart from other candidates. You can demonstrate your skills by creating smaller projects from start to finish. With advanced large language models (LLMs), even developers with limited experience can create impressive projects. So, go ahead and build cool things and show off your skills in new and exciting ways!

This article will share 15 side project ideas that utilize LLMs for downstream tasks.

So what are you waiting for? Start building that portfolio and let your skills and passion shine!

Calling all data science and AI enthusiasts! Get ready to ignite your passion and take a deep dive into the world of data at the highly anticipated DataHack Summit 2023. From the 2nd to the 5th of August, we’re taking over the prestigious NIMHANS Convention Centre in Bangalore for an unforgettable event. Whether you’re a seasoned pro or just starting your journey in the world of data, this summit is tailor-made for you. Brace yourself for a thrilling experience filled with cutting-edge workshops, insightful sessions, and unparalleled networking opportunities. It’s time to immerse yourself in the latest trends, connect with industry leaders, and take your skills to new heights. Don’t miss out on this incredible opportunity to be a part of the data revolution. See you at DataHack Summit 2023!

List of Top 10 Projects on Large Language Models(LLM)

Here is list of top 10 projects on Large Language Models(LLMs)

  • Cover Letter Generator
  • Customized ChatBot
  • Youtube or Podcast Summarizer
  • Information Extraction
  • Web Scrapper
  • Question Answering as Document
  • Clustering and Classification of Documents into Topics
  • Plagiarism Checker
  • Fake News Detector
  • Personalized News Aggregator
  • Speech Recognition

All the procedures and steps are classified below for the specified LLM projects above.

Cover Letter Generator

Large language models (LLMs) can generate coherent text, which is useful for a variety of purposes, such as copywriting, programming, and writing cover letters. While some people express concern that LLMs could facilitate the creation of fake news or enable cheating on schoolwork, others are actively leveraging LLMs to enhance productivity and foster creativity.

If you are looking for a new job, you might want to consider creating a cover letter generator using an LLM. While you could technically create a cover letter generator by manually engineering the perfect prompt and filling it with the relevant information about each job, this would be time-consuming and repetitive.

An LLM-powered cover letter generator could save you a lot of time and effort, and it could help you to create more effective cover letters.

Customized ChatBot

You’ve heard of ChatGPT. I don’t need to go into detail here. Its conversational capabilities are pretty impressive. But it lacks personality and has limited information. What if you could give it access to specific knowledge or even a full personality?

The first example is not only a cute and whimsical idea, but it also serves a therapeutic purpose. Michelle Huang built a chatbot based on her diaries to chat with her childhood self.

In a “Black Mirror” episode called “Be Right Back” from 2013, the grieving protagonist reconnects with her late boyfriend after learning about a service that lets people stay in touch with the deceased.

Ten years later, you could technically build this on your laptop as a weekend project…

Although this example is a bit morbid, who’s to say we won’t see this technology help us grieve in the future?

Here are the rough steps you would follow to realize a project like these:

  1. Collect data from your old diaries or chat history and load it into documents
  2. Feed an LLM the contextual information in the prompt
  3. Add conversational memory

Youtube or Podcast Summarizer

LLMs are useful in summarizing the vast amount of AI-generated content available today, especially across different mediums like text, audio (e.g., podcasts), and video. 

It can be challenging to understand references to older episodes that we may have missed, making it convenient to search for relevant episodes and get their key points. 

For instance, YouTube videos can be summarized, and making episodes searchable could help content creators’ databases answer questions about specific topics. To achieve this, one would need to download the transcript, split it into manageable chunks, summarize the text using an LLM, and optionally create a user-friendly interface.

Here are the rough steps you would follow to realize this project:

  1. Download the video or podcast transcript and load it into documents
  2. Split long documents into chunks
  3. Summarize the transcript with an LLM
  4. Optional: Wrap it all in a user-friendly command line interface or even a web application

Information Extraction

LLMs can be utilized for information extraction by providing them with examples containing text and the desired information to extract. By adding a component to extract relevant information from job postings directly, the cover letter generator can be further enhanced. 

To achieve this, one would need to load the job description into a document and use prompt engineering to create a prompt with examples for the LLM to extract the relevant information.

Here are the rough steps you would follow to realize this project:

  1. Load job description from job posting into a document
  2. Extract the relevant information with the LLM by prompt engineering a prompt using examples

Web Scrapper

LLMs are highly proficient in transforming texts to suit various needs such as changing the writing style to match that of a particular publication like “The Economist” or “New Yorker.” 

They can also adjust the reading level for easy comprehension, reformat information across different formats, correct spelling and grammar, and translate text from one language to another. It is common practice to use LLMs for converting text from one form to another.

An innovative way to utilize the rewriting potential of LLMs is through web scraping. Writing a web scraper can be tedious, but with LLMs, you could develop a more versatile solution for extracting data from unstructured websites.

Here are the rough steps you would follow to realize this project:

  1. Scrape the website’s source code and load it into a document
  2. Split long documents into chunks
  3. Extract the relevant data from the source code using the LLM (see extraction)
  4. Reformat the extracted data into the desired format with the LLM by prompt engineering a prompt using examples

Question Answering as Document

The process of question-answering can be seen as a fusion of search and summarization techniques. It has the potential to facilitate a more user-friendly approach to dealing with any type of document.

If you wish to undertake a similar project, then consider following these basic steps:

  1. Transform source code into documents.
  2. Divide lengthy documents into smaller segments.
  3. Create embeddings using an embedding model and save them for each document.
  4. Specify an index query that can gather relevant context and trigger the LLM (Language Model) to generate an answer based on it.

Clustering and Classification of Documents into Topics

In addition to retrieving information from documents, embeddings can be employed for categorizing documents by utilizing clustering techniques through unsupervised learning.

If you are interested in undertaking a similar project, here’s a basic outline of the steps involved:

  1. Transform content into documents.
  2. Segment lengthy documents into smaller parts.
  3. Use an embeddings model to create embeddings from the documents and save them.
  4. Apply a clustering algorithm that takes embeddings as input to cluster those documents.

Classifying Inquiries

Classification techniques can categorize documents in a supervised manner, similar to clustering.

If you want to create a similar project, here’s a brief guide on the key steps:

  1. Transform emails into documents.
  2. Create embeddings using an embedding model and save them for each document.
  3. Utilize the embeddings to train a classifier that can categorize the documents based on certain criteria.

Plagiarism Checker

The prevalence of plagiarism is high both online and in academic settings, making it difficult to identify instances of copied content. Various individuals such as bloggers, educators, and news organizations may need to check for plagiarism in written works.

News Projects

Fake News Detector

  1. With the rise of fake news online, there is a growing need for tools to detect false information. LLMs can be used to identify inconsistencies and inaccuracies in news articles.
  2. To undertake this project, you would need to train the model on a dataset of real and fake news articles, test the accuracy of the model on new articles, and present the results in a user-friendly manner.

Personalized News Aggregator

  1. News aggregators can personalize content for users by using LLMs to analyze their reading history and present articles that align with their interests.
  2. To undertake this project, you would need to collect data on the user’s reading habits, use an LLM to analyze the text of news articles and present the results in a user-friendly manner. This could involve creating a mobile app or browser extension.

Speech Recognition

  1. LLMs can also be used for speech recognition, which involves transcribing spoken words into text. This technology has practical applications in areas such as virtual assistants and transcription services.
  2. To undertake this project, you need to train the model on a dataset of audio files and their corresponding transcripts, test its accuracy on new audio files, and create a user interface for users to input audio files to be transcribed.

Conclusion

In conclusion, creating a portfolio of your projects, blog posts, and open-source contributions is an excellent way to showcase your skills and set yourself apart from other job candidates, especially in software development or data science. With the help of advanced large language models (LLMs), even developers with limited experience can create impressive projects. This article has shared 15 side project ideas that utilize LLMs for downstream tasks such as cover letter generation, web scraping, speech recognition, question answering as document, and more. By creating smaller projects from start to finish and utilizing LLMs, you can demonstrate your creativity, productivity, and problem-solving skills. So, don’t wait any longer–start building your portfolio today and let your skills and passion shine!

But wait! Before you wrap things up, there’s something extraordinary I need to share with you. Get ready to be blown away by a lineup of mind-expanding workshops at the highly anticipated DataHack Summit 2023. From ‘Applied Machine Learning with Generative AI‘, ‘Mastering LLMs: Training, Fine-tuning, and Best Practices‘, ‘Solving Real World Problems Using Reinforcement Learning’, these workshops will unleash your creativity and expertise like never before. Gain practical skills, real-world knowledge, and the confidence to conquer any data challenge that comes your way. Don’t miss out on this incredible opportunity to be part of the DataHack Summit 2023. Secure your spot and embark on an unforgettable journey.

Aayush Tyagi 27 Jun 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear