Data Science Web nugget Roundup, Jan 14: Kaggle Datasets & Python Debugging

In our first weekly roundup of data science nuggets from around the web, check out a list of curated articles on Kaggle datasets, Python debugging tools, what it is data scientists do, an overview of YOLO, 2-dimensional PyTorch tensors, and the secrets of machine learning deployment.



Data Science Web nugget Roundup, Jan 14: Kaggle Datasets & Python Debugging
 

Welcome to our first weekly roundup of data science nuggets from around the web. Our initial list has articles on topics from Kaggle datasets to YOLO to deploying to machine learning production and beyond.

Enjoy some of our favorite articles of the week that didn't appear on KDnuggets.

 
10 Datasets from Kaggle You Should Practice On to Improve Your Data Science Skills

Let's start this off right with a collection of Kaggle datasets to get your hands on. Andrew Lombarti has compiled a list of 10 datasets to cut your teeth on, from the famous Titanic set great for beginners, to computer vision mainstays like MNIST and CIFAR. You might already be familiar with these datasets, but then again you might not be, and I'd bet there are one or two in the mix you either haven't played with ever or, at the very least, in a while. Head on over to Toward Data Science for the article and have a look for yourself.

 
Python debugging tools

In this article from Machine Learning Mastery, Adrian Tam explores pdb, the built-in Python debugger, which he argues can provide developers a lot of help if we just know how to use it. Read this tutorial to first find out what a debugger can help with and how one is used, and then learn more about pdb and look at some of the alternatives.

 
What do data scientists do?

On Toward Data Science, Anaconda Senior Data Scientist Sophia Yang discusses the fundamental question, "What do data scientists do?" She argues that they bring value, first and foremost, which is business-specific, and takes the form of "north star metrics" such as increasing monthly users for a business which has at the core of its business model increasing monthly users. Sophia goes on to share the five areas which she argues that data scientists are most highly valued in business.

 
An Overview of YOLOv4 and YOLOv5

Anyone interested in understanding computer vision will want to read Adrien Payong's overview of YOLOv4 and YOLOv5. Read a bit of the history of the YOLO algorithms, understand the development between the most recent versions, and learn about real time fruit detection using YOLO. A good technical read for computer vision enthusiasts.

 
Two-Dimensional Tensors in Pytorch

On Machine Learning Mastery, Muhammad Asad Iqbal Khan writes a comprehensive tutorial on 2-dimensional PyTorch tensors. Khan follows up his previous article on 1-dimensional PyTorch tensors by applying many of the same operations to 2-dimensional tensors. Specific topics covered include creating tensors, exploring tensor shapes, slicing and indexing tensors, and applying numerous tensor mathematics methods.

 
The Secret of Delivering Machine Learning to Production

Inbal Budowski-Tal analyzes why projects that start out with such promise fall so flat in production — this is, if they even make it to production. These projects don't just fail for no reason, but is there some common denominator? Budowski-Tal outlines the pitfalls to successful deployment, and provides some suggestion as to how to avoid them.