Radar / AI & ML

Reinforcement learning for the real world

Edward Jezierski on the science of bringing creativity and curiosity together in a learning system.

By Jenn Webb

January 15, 2020

Roger Magoulas recently sat down with Edward Jezierski, reinforcement learning AI principal program manager at Microsoft, to talk about reinforcement learning (RL). They discuss why RL’s role in AI is so important, challenges of applying RL in a business environment, and how to approach ethical and responsible use questions.

Here are some highlights from their conversation:

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Reinforcement learning is different than simply trying to detect something in an image or extract something from a data set, Jezierski explains— it’s about making decisions. “That entails a whole set of concepts that are about exploring the unknown,” he says. “You have the notion of exploring versus exploiting, which is do the tried and true versus trying something new. You bring in high-level concepts like the notion of curiosity—how much should you buy as you try new things? The notion of creativity—how crazy are the things you’re willing to try out? Reinforcement learning is a science that studies how these things come together in a learning system. (00:18)

The biggest challenge for businesses, Jezierski says, is correctly identifying and defining goals, and deciding how to measure success. For example, is it the click you’re after or something a bit deeper? This honest, clarifying conversation is key, he says. “This is why we’re focused first on the applied use of services because it can be very abstract otherwise. It’s like, ‘Oh, I’ve got to make decisions. I get rewards, and I’m going to explore—how do I look at my own business problem through that light?’ A lot of people get tripped up in that. So we’ll try to say, ‘Look, we’re going to draw a smaller box. We’re going to say we want to define personalization using RL as ‘choose the right thing’ for my menu in a context and tell us how well it went.’ That’s not the universe of possibility, but 90% of people can frame a part of their problem that way. If we can design a small box where people in it can have guaranteed results and we can tell you whether you fit in the box or not, that’s a great way to get people started with RL.” (3:24)

Ethics and responsible use are essential facets of reinforcement learning, Jezierski notes. Guidelines in this area aren’t necessarily addressing bad actors, but are aiming to help those unaware of the consequences of what they’re doing become more aware and to help those who are aware of the consequences and have good intentions to have more backing. Asking the right questions, Jezierski explains, is the difficult part. “In reinforcement learning, you get very specific questions about ethics and personalization—like, where is it reasonable to apply reinforcement learning? Where is it consequential to explore or exploit? Should insurance policies be personalized in a webpage using reinforcement learning, and what are the attributes that should drive that? Or is an algorithm trying to find out better ways that are not goaled toward the purpose of insurance, which is a long-term financial pool of risk and social safety net. Is it even ethical to apply to that sort of scenario?” It’s important, Jezierski says, to make these types of conversations non-taboo in team environments, to empower anyone on the team to hit the brakes to address a potential issue. “If you have an ethical or responsible use concern, you can stop the process and it’s up to everybody else to justify why it should restart. It’s not up to you to justify why you stopped it. We take it very seriously because in the real world, these decisions will have consequences.” (9:40)

Post topics: AI & ML

Post tags: Deep Dive

Reinforcement learning for the real world

Learn faster. Dig deeper. See farther.

Get the O’Reilly Radar Trends to Watch newsletter

Thank you for subscribing.