Introduction to Conditional Random Fields

I built a visualization to explore embeddings a few years ago, but never posted it more broadly. So here it is! http://blog.echen.me/embedding-explorer/

These are GloVe embeddings projected into 2D, colorized via k-means in the original space.

You can see, for example, that the cluster in red on the right is a cluster of geographical terms.

Geography Cluster

We can click on one of these points ("iceland") to see its nearest neighbors in the high-dimensional space (mostly other countries!) as well as other points that belong to the same cluster (Cluster 18 is this red cluster).

Iceland

We can also inspect each individual embedding dimension, to understand what it's picking up. Embedding Dimension 1, for example, seems to capture sportiness.

Embedding Dimension 1

We can also slide through the points based on their embedding dimension to get a better sense:

Embedding Dimension 1 Slide

Anyways, play around with the explorer here, and feedback is always welcome!

Edwin Chen

Founder at Surge AI, the world's most powerful data labeling platform and workforce for NLP.


Need obsessively high-quality human-powered data? Reach out! We help top AI companies like OpenAI, Amazon, and Airbnb create stunning high-skill, human-labeled datasets.


Former AI & engineering lead at Google, Facebook, Twitter, Dropbox, and MSR. Pure math, theoretical CS, and linguistics at MIT.


Surge AI
Surge AI Twitter
Surge AI Blog
Surge AI Github
Surge AI LinkedIn

Twitter
LinkedIn
Github
Quora
Email

Recent Posts

A Layman's Introduction to Perplexity in NLP

An Introduction to Inter-Annotator Agreement and Cohen's Kappa Statistic

A Visual, Layman's Introduction to Language Models in NLP

Introduction to Conditional Random Fields

Surge AI: A New Data Labeling Platform and Workforce for NLP

Exploring LSTMs

Moving Beyond CTR: Better Recommendations Through Human Evaluation

Propensity Modeling, Causal Inference, and Discovering Drivers of Growth

Product Insights for Airbnb

Improving Twitter Search with Real-Time Human Computation

Edge Prediction in a Social Graph: My Solution to Facebook's User Recommendation Contest on Kaggle

Soda vs. Pop with Twitter

Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

Instant Interactive Visualization with d3 + ggplot2

Movie Recommendations and More via MapReduce and Scalding

Quick Introduction to ggplot2

Introduction to Conditional Random Fields

Winning the Netflix Prize: A Summary

Stuff Harvard People Like

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

Introduction to Latent Dirichlet Allocation

Introduction to Restricted Boltzmann Machines

Topic Modeling the Sarah Palin Emails

Filtering for English Tweets: Unsupervised Language Detection on Twitter

Choosing a Machine Learning Classifier

Kickstarter Data Analysis: Success and Pricing

A Mathematical Introduction to Least Angle Regression

Introduction to Cointegration and Pairs Trading

Counting Clusters

Hacker News Analysis

Layman's Introduction to Measure Theory

Layman's Introduction to Random Forests

Netflix Prize Summary: Factorization Meets the Neighborhood

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Prime Numbers and the Riemann Zeta Function

Topological Combinatorics and the Evasiveness Conjecture

Item-to-Item Collaborative Filtering with Amazon's Recommendation System