Decision Tree vs. Random Forest – Which Algorithm Should you Use?

Analytics Vidhya

Algorithm Beginner Classification Machine Learning Python Structured Data Supervised classification decision tree Loan Prediction Data random forest

4 Boosting Algorithms You Should Know – GBM, XGBM, XGBoost & CatBoost

Analytics Vidhya

Intermediate Machine Learning Python Structured Data Supervised Boosting Boosting Algorithms boosting machine learning catboost GBM gbm in python LightGBM XGBoostHow many boosting algorithms do you know? Can you name at least two boosting algorithms in machine learning?

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Build a Decision Tree in Minutes using Weka (No Coding Required!)

Analytics Vidhya

Beginner Classification Machine Learning Regression Structured Data Technique classification decision tree non-programmers regression WekaLearn how to build a decision tree model using Weka This tutorial is perfect for newcomers to machine learning and decision trees, and those.

What is Multicollinearity? Here’s Everything You Need to Know

Analytics Vidhya

Beginner Python Regression Statistics Structured Data Technique linear regression multicollinearity multicollinearity machine learning multicollinearity statistics python statistics VIF

10+ Simple Yet Powerful Excel Tricks for Data Analysis

Analytics Vidhya

Overview Microsoft Excel is one of the most widely used tools for data analysis Learn the essential Excel functions used to analyze data for. The post 10+ Simple Yet Powerful Excel Tricks for Data Analysis appeared first on Analytics Vidhya.

All you Should Know About Datetime Variables in Python and Pandas

Analytics Vidhya

The Complex yet Powerful World of DateTime in Data Science I still remember coming across my first DateTime variable when I was learning Python. Beginner Libraries Python Structured Data date features datetime feature engineering pandas time features timedelta

Support Vector Regression Tutorial for Machine Learning

Analytics Vidhya

Algorithm Beginner Machine Learning Python Regression Structured Data regression Support Vector Machine support vector machine regression SVM svm kernel svm regression SVR

Introduction to Polynomial Regression (with Python Implementation)

Analytics Vidhya

Beginner Machine Learning Python Regression Structured Data Supervised linear regression multi-dimensional polynomial regression Polynomial regression regression

Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization

Analytics Vidhya

Beginner Data Engineering Python Regression Structured Data Technique Feature scaling feature scaling machine learning feature scaling python normalizaiton vs. standardization normalization standardization

Learn How to use the Transform Function in Pandas (with Python code)

Analytics Vidhya

Honestly, most data. Data Mining Intermediate Libraries Python Structured Data data exploration feature engineering pandas

What are Lambda Functions? A Quick Guide to Lambda Functions in Python

Analytics Vidhya

Beginner Data Exploration Programming Python Structured Data apply() filter IIFE lambda function Map ReduceIntroduction For loops are the antithesis of efficient programming. They’re still necessary and are the first conditional loops taught to Python beginners but in.

Build Your First Text Classification model using PyTorch

Analytics Vidhya

Deep Learning NLP PyTorch Structured Data Supervised Text Pros of PyTorch pytorch text classificationOverview Learn how to perform text classification using PyTorch Understand the key points involved while solving text classification Learn to use Pack Padding feature.

A Beginner’s Guide to matplotlib for Data Visualization and Exploration in Python

Analytics Vidhya

matplotlib – The Most Popular Python Library for Data Visualization and Exploration I love working with matplotlib in Python. The post A Beginner’s Guide to matplotlib for Data Visualization and Exploration in Python appeared first on Analytics Vidhya.

One-Hot Encoding vs. Label Encoding using Scikit-Learn

Analytics Vidhya

These are typical data science interview questions every aspiring data scientist. Intermediate Python Structured Data Technique categorical encoding Dummy Variable label encoding live coding One Hot Encoding scikit-learnWhat is One-Hot Encoding?

GroupBy in Pandas: Your Guide to Summarizing and Aggregating Data in Python

Analytics Vidhya

The post GroupBy in Pandas: Your Guide to Summarizing and Aggregating Data in Python appeared first on Analytics Vidhya. Beginner Data Exploration Python Structured Data Technique aggregation data transformation filtration groupby pandas python split-apply-combine

Demystifying the Mathematics Behind Convolutional Neural Networks (CNNs)

Analytics Vidhya

Algorithm Computer Vision Deep Learning Intermediate Python Structured Data Supervised backward propagation CNN convolutional neural network deep learning filters forward propogation kernel NumPy

The Art of Storytelling in Analytics and Data Science | How to Create Data Stories?

Analytics Vidhya

The post The Art of Storytelling in Analytics and Data Science | How to Create Data Stories? Beginner Business Analytics Data Visualization R Resource Structured Data data visualisation stories analytics story telling storytelling analytics storytelling business analytics storytelling data science tips for data visualizationIntroduction The idea of storytelling is fascinating; to take an idea or an incident, and turn it into a story.

Build Better and Accurate Clusters with Gaussian Mixture Models

Analytics Vidhya

Algorithm Clustering Intermediate Machine Learning Python Statistics Structured Data Technique Unsupervised clustering EM expectation maximization Gaussian Distribution gaussian mixture models GMM kmeans Probability density function python

Joins in Pandas: Master the Different Types of Joins in Python

Analytics Vidhya

Beginner Data Exploration Programming Python Structured Data full join inner join JOINS IN PANDAS left join merge dataframes pandas right joinIntroduction to Joins in Pandas “I have two different tables in Python but I’m not sure how to join them.

6 Powerful Feature Engineering Techniques For Time Series Data (using Python)

Analytics Vidhya

Overview Feature engineering is a skill every data scientist should know how to perform, especially in the case of time series We’ll discuss 6. The post 6 Powerful Feature Engineering Techniques For Time Series Data (using Python) appeared first on Analytics Vidhya.

Very Meta … Unlocking Data’s Potential with Metadata Management Solutions


Untapped data, if mined, represents tremendous potential for your organization. While there has been a lot of talk about big data over the years, the real hero in unlocking the value of enterprise data is metadata , or the data about the data.

Understanding Structured and Unstructured Data


We live in a world of data: there’s more of it than ever before, in a ceaselessly expanding array of forms and locations. Dealing with Data is your window into the ways Data Teams are tackling the challenges of this new world to help their companies and their customers thrive.

Everything you Need to Know About Scikit-Learn’s Latest Update (with Python Implementation)

Analytics Vidhya

Beginner Infographic Libraries Machine Learning Programming Python Structured Data machine learning Machine Learning Models python Python library scikit-learn scikit-learn model sklearnIntroduction Scikit-learn is one Python library we all inevitably turn to when we’re building machine learning models. I’ve built countless models using this wonderful.

Creating a Big Data Platform Roadmap

Perficient Data & Analytics

One of the most frequently asked questions by our customers is the roadmap to deploying a Big Data Platform and becoming a truly data-driven enterprise. Just as you can’t build a house without a foundation, you can’t start down a big data path without first establishing groundwork for success. There are several key steps to prepare the organization to realize the benefits of a big data solution with both structured and unstructured data.

How to Gain Valuable Insights from Untapped Data Using AI

Perficient Data & Analytics

You probably know your organization needs to invest in artificial intelligence (AI) solutions to take advantage of the deluge of data that mobile and digital users are creating, but do you know why or how? LEGACY ANALYTICS METHODS AREN’T EQUIPPED TO PROCESS ALL DATA TYPES. The majority of data is unstructured (around 80%) which means it isn’t clearly defined or easily searchable the way that structured data is. LEVERAGE YOUR DATA WITH AI.

A Layman’s Guide to Data Science: How to Become a (Good) Data Scientist


How simple is Data Science? Sometimes when you hear data scientists shoot a dozen of algorithms while discussing their experiments or go into details of Tensorflow usage you might think that there is no way a layman can master Data Science.

Webinar: Natural Language Processing for Digital Transformation of Unstructured Text


Learn how pharma and healthcare organizations are using the power of Natural Language Processing (NLP) to transform unstructured text into actionable structured data.

A Comprehensive Guide to Natural Language Generation


In its essence, it automatically generates narratives that describe, summarize or explain input structured data in a human-like manner at the speed of thousands of pages per second.

Building tools for enterprise data science

O'Reilly on Data

The O’Reilly Data Show Podcast: Vitaly Gordon on the rise of automation tools in data science. In this episode of the Data Show , I spoke with Vitaly Gordon , VP of data science and engineering at Salesforce. Continue reading Building tools for enterprise data science

How Artificial Intelligence Will Disrupt the Financial Sector


Artificial intelligence thrives with data. The more data you have, the better your algorithms will be. However, just having a lot of data is not sufficient anymore. More data beats clever algorithm, but better data beats more data." - Peter Norvig - Director of Research, Google.

NLP vs. NLU: from Understanding a Language to Its Processing


They both attempt to make sense of unstructured data, like language, as opposed to structured data like statistics, actions, etc. However, NLP and NLU are opposites of a lot of other data mining techniques.

IT 257

Evolving Insurance with Data and Analytics


Data as a Launchpad. This evolution of insurance industry offerings is largely a result of the underlying data available to insurers. Fast forward and that data now provides the foundation, the launchpad, that enables new innovations and is propelling the insurance industry forward.

Analytics and Artificial Intelligence – A Blue Ocean of Opportunities


However, the data that it generates, both structured and unstructured, has a kind of a digital footprint, which can be thoughtfully analyzed and utilized. Businesses typically have their own enterprise data management platform to manage, store, and retrieve their multi-structured data. They can use Analytics and AI on top of it to correlate their data sets and come up with evolving business trends and patterns.

Text Analytics – Understanding the Voice of Consumers


Text analytics helps to draw the insights from the unstructured data. . – into structured data to develop actionable managerial insights to enhance their operations. . .

Mastering Data Variety


Data variety — the middle child of the three Vs of Big Data — is in big trouble. . It’s in the critical path of enterprise data becoming an asset. Meanwhile, most enterprises have unconsciously built up extreme data variety over the last 50+ years. Data Types.

Text Analytics – Understanding the Voice of Consumers


Text analytics helps to draw the insights from the unstructured data. into structured data to develop actionable managerial insights to enhance their operations.

Operational Database in CDP


Cloudera’s operational database (OpDB) in CDP delivers a real-time, always available, scalable OpDB that serves traditional structured data alongside new unstructured data within a unified Operational and Warehousing platform.

Research quality data and research quality databases

Simply Statistics

When you are doing data science, you are doing research. You want to use data to answer a question, identify a new pattern, improve a current product, or come up with a new product. That is why the key word in data science is not data, it is science.

Big Data Ingestion: Parameters, Challenges, and Best Practices


Businesses are going through a major change where business operations are becoming predominantly data-intensive. quintillions of bytes of data are being created each day. This pace suggests that 90% of the data in the world is generated over the past two years alone. Big Data.

What is XSLT and Why is it so Important?


One of the most valuable concepts in the world of data and analytics is making data "readable" by both machines and humans. One of the best ways to make information in data sets transparent to humans and machines is with XSLT (eXtensible Stylesheet Language Transformations).

IT 61

Don’t get left behind the modern data warehouse train!


Why are most organizations replatforming and moving to a modern data warehouse? Instead, they are guided by data serving up answers to questions, perhaps asked by experts who are in those boardrooms. This requires direct and fast access to data and lots of it.

Ontotext Knowledge Graph Platform: The Modern Way of Building Smart Enterprise Applications


According to an article in Harvard Business Review , cross-industry studies show that, on average, big enterprises actively use less than half of their structured data and sometimes about 1% of their unstructured data.

B2B 56

Investment Company Reporting Modernization Goals & Expectations

Perficient Data & Analytics

Ease of access, aggregation, and analysis of the reported data by the Commission and the public. New forms you must submit: Form N-PORT: Requires investment companies to report portfolio information monthly in a structured data format. Form N-CEN: Requires investment companies to report census-type information annually in a structured data format.