How to Build Strong Data Science Portfolio as a Beginner

After learning the basics of data science, you can start to work on real-world problems. But how do you showcase your work? In this article, we are going to learn a unique way to create a data science portfolio.



Image by Author | Elements by Free Vector | Statistics concept illustration

 

As a beginner, I had many questions about how do I start? How do I learn, or where do I get ideas to work on projects. So, after a long search, I found a project on data analysis. It took me 3 days just to write code, and I was happy with my first try, but then there was this big question of how do I share it with the world? I simply did not have good coding skills or documentation skills to showcase my work, so I stored it in the cloud and forgot about it. After a month, I was randomly looking for more projects on GitHub and found this amazing profile that motivated me to create my portfolio. That was the best decision I made as it put me on the map of the developer community, and soon after, I started to get emails from the recruiters and beginners about my projects.

Getting a job is usually the main reason for building a portfolio. Sometimes, it’s necessary if we don’t have the relevant education or experience (eugeneyan.com). In this modern world, employers are skeptical about hiring new graduates, so how do you convince them that you are best for the job? You display your skills by showing the work you have done in a previous project. The stronger your online portfolio, the higher chance you have of getting hired for your dream job.

 

"The portfolios are extremely critical to have because when you’re in the interview, it shows your real-world experience, so you can explain to an employer from A to Z the entire data science workflow."

— David Yakobovitch.

 

The other motivation is to create your personal project that satisfies your curiosity about learning new things. When we learn a new skill, we want to experiment and eventually build a working product that can be used in the real world.

In this article, we will learn the ways you can showcase your work as a data science beginner. You will learn about some new platform that makes your life easy and tips on building strong portfolios.

 

GitHub

 
Let me just clear the misconception among data scientists. Yes, GitHub is necessary, and we all should learn git. As a data scientist, I use Github daily, where I look for interesting data sets and projects. This is the most popular platform among developers, and to be honest, the recruiter does check your GitHub profile before calling you for an interview.

 

Image By Author | github

 

GitHub is a global collaborative platform where people share and collaborate on projects. As you can see in my profile below how I have contributed to other people’s projects and also worked on my own projects too.

 

Image by Author | kingabzpro

 

Tips for creating a solid profile:

  1. Create your profile page, and for a complete tutorial, check out Sarah Hart’s blog.
  2. Document every project with links, cover images, and detailed descriptions.
  3. Fork the project that you like the most and send your first pull request (freecodecamp.org).
  4. Be active on this platform by contributing, bug reporting, and pushing your current projects.

 

Deepnote

 
Deepnote is much simpler than GitHub, and it's beginner-friendly too. If you are familiar with Jupyter notebook then it will be a piece of cake for you to publish your first project. My experience with Deepnote is absolutely amazing as the platform provides you all the qualities of GitHub but is much simpler and focused on the data scientist’s community.

 

Image by Author | Pakistan Vaccination Progress

 

Recently, they introduced a Deepnote profile that will showcase all the notebooks you publish with your information and profile picture.

 

Image by Author | Deepnote

 

Just like GitHub Gist, you can share a snippet of your code with your team or the public in general. I used Deepnote cell on all the Medium Publication and social media platforms. You can check my previous article to understand how to implement a Deepnote cell. Using snippets of code with output gives you the ability to share your projects on multiple platforms.

The reason I prefer Deepnote embedded cell over GitHub Gist is that it comes with output, not just static output but with interactive features.

You can use Plotly and display your chart in a Medium article:

 

 

Tips for creating a solid profile:

  1. Update your bio, profile photo, and contact information.
  2. Always add detailed descriptions about your project by using markdown cell.
  3. Use the cover photo to make your project stand out.
  4. Use App features in Deepnote to create Interactive webapp.
  5. Keep posting your old project or even reposting notebooks from GitHub.

 

DAGsHub

 
DAGsHub is new to this world, and it’s making its name quickly by providing a one-stop solution for machine learning practitioners and data engineers. DAGsHub comes with a DVC server, MLflow, Visualizing pipeline, and GitHub Synchronization. We won’t be going deep into features but will focus on the features that make it stands out.

DAGsHub allows you to share your GitHub repository and create your data science project with the ability to visualize machine learning and data pipelines. It also has a hidden feature README.ipynb as your project description file, which is best for beginners who are not used to markdown and data scientists who love working on Jupyter Notebook. It is similar to GitHub, which means you need to learn both Git and DVC to use this platform properly.

 

"What I’ve seen other users enjoy is the ability to visualize their project structure via the pipeline, as well as the ability to see their data and models as an integral part of the project. Also, the fact that we are based on open-source tools instead of reinventing existing solutions is something people like."

— Dean

 

Image by Dean | dagshub

 

My profile is quite new, but I love this platform as they provide me with a complete machine learning ecosystem. I think I prefer it more than GitHub in terms of features and UI simplicity.

 

Image by Author | DAGsHub

 

Tips for creating a solid profile:

  1. Learn DVC, Git, and MLflow to take full advantage.
  2. Add project description to your notebook and README.
  3. Update your profile by adding bio, avatar, and contact information.
  4. Try to add dvc.yaml and dvc.lock in your project to display data pipelines. For more information, check out Defining the Pipeline.
  5. Keep an active profile by contributing to open-source projects and by pushing your personal project. You can use fds cli to make your life easy and avoid mistakes.
  6. Takes full use of DVC by uploading your data and model on a remote server. Recruiters are interested in candidates that know the complete data science cycle from data ingestion to dashboards.

 

Kaggle

 
If you want to get noticed faster in the world of data science, you should create a Kaggle account and start contributing to competitions, datasets, notebooks, and discussions. When you become a grandmaster, people respect you and offer you better career opportunities. If you ask me, I suggest you create a Kaggle profile while learning the basics. Learn from experts and discover your niche. I am a huge fan of this platform as it provides support for a beginner to compete and develop innovative solutions for various industries. It is the backbone of AI research.

 

Image by Author | Kaggle

 

You can check out my profile below, as from the start, I have been contributing in various categories to gain ranks. Currently, I am an Expert, but with one gold and silver medal in the competition, I will become a Master, which is not easy, and honestly, I respect Grandmasters as they have proven that they are the best among other data practitioners.

 

Image by Author | Kaggle

 

Tips for creating a solid profile:

  1. Be active on the platform by using new datasets and creating data analysis or machine learning models.
  2. Participate in discussion, learn from experts, and ask for help.
  3. Use web scraping to publish a new dataset.
  4. Participate in most competitions to learn several types of machine learning problems and to earn badges.
  5. Focus on publishing your best work with detailed descriptions and high-quality code.
  6. Write about yourself in bio and add contact details.

 

Blog

 
Writing blogs are the next step after creating your project on the above platforms. If you want to expand your audience, I will highly suggest you start with Medium. Writing a blog is not necessary, but you get more traction from various fields. The Medium platform allows you to create your profile and let you publish your articles under various publications such as Towards Data Science and Towards AI. You can develop your blogging site or use another similar platform such as Analytics Vidhya.

 

Image by Author | Medium

 

Tips for creating a solid profile:

  1. Write blogs about the project you personally worked on.
  2. Create blogs on an emerging technology or on new data science applications.
  3. Do proper research while writing blogs and add citations to avoid platform rules violations.
  4. Use attractive cover photos for every blog.
  5. Always write about what you learn from your experience while developing data science projects.
  6. Don’t follow the trend, and focus on the things you are good at.

 

Portfolio Website

 
You can also display your project on a personal website, and if you are not a web developer, there are some simple tools available to make the process quite easy. You can check out How to Build a Data Science Portfolio Website with Hugo & GitHub Pages and Hugo for various templates.

My portfolio website has a project from all the platforms with short descriptions and subcategories. It took me three days to create the entire website and deploy it on GitHub pages.

 

Image by Author | Portfolio

 

Tips for creating a solid portfolio website:

  1. Add your skill, bio, and CV.
  2. Display your experience and a
  3. Showcase your projects with links to your GitHub or Deepnote projects.
  4. Make your website minimal and interactive so that the recruiter has an easy time scrolling through your entire portfolio.
  5. Keep your portfolio website up to date with the latest project you are working on.

 

Weight & Biases

 
I usually use Weight & Biases for machine learning experimentation and logging performance metrics of my models, but that changed with the introduction of the W&B profile. You can write a blog about your current project by using embedded links and graph integration. It is quite similar to other portfolio platforms I mentioned, but it comes with the perk of direct integration with Python libraries.

The Ayush profile has impressed me the most as he has been contributing to other organizations while writing blogs about machine learning.

 

Image by Ayush | Weights & Biases

 

The W&B project has model performance metrics, as shown below.

 

Image by Author | kaggle-seti

 

Tips for creating a solid profile:

  1. Join other data science organizations and participate in group projects.
  2. Use W&B API to display your machine learning project results.
  3. Write a blog using W&B metrics integration.
  4. Add a bio, profile picture, contact information.
  5. Try to engage in community discussion and always look for a new interesting project.

 

Conclusion

 
W&B is a wildcard as it is famous for logging experiments and not for portfolios, but the introduction of interactive blogs has given us the unique advantage of displaying your project and create a strong portfolio.

If you are a beginner, I will suggest you start with Deepnote, as it’s free for teams and give your beginner-friendly tools to get started. If you are looking to get noticed by the data science community, try creating your profile on GitHub and Kaggle. If you are into creating your brand, then start with blogging sites or create your website.

In the end, I want you all to create your profile on all the platforms I mentioned above, as they all come with unique advantages in impressing your potential employer. I know it’s quite overwhelming at the start, but once you get used to documenting and showcasing your projects, it will get easy.

 
 
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.