(Disclaimer: This is a rewrite of a previous blog on a different platform)
I have recently started learning more about AI, how it works, and what can be done. To do this, I dusted off my coding skills and built an AI agent or bot, using Reinforcement Learning to try to find stocks to trade for short-term profits in my IRA account.
Reinforcement Learning is a specific technique in AI or Machine Learning where you provide the AI agent or bot an environment to observe and interact with. There is a series of possible actions. The bot gets positive or negative rewards (i.e., punishment) based on the actions performed. The bot aims to maximize the total possible reward from its interactions. As it goes through the environment thousands or even millions of times, it identifies patterns that help maximize the total reward. It then uses those learned patterns to make decisions. Hence, Reinforcement Learning reinforces specific actions that provide the best (or least bad) outcome.
I used OpenAI's OpenGym framework to build my environment. For my reward logic, I rewarded the bot for every profitable trade and punished it for every losing trade. I also included a time discount, so the loss increased if it held a losing trade too long. I reduced the reward if the profitable trades stopped increasing so I could efficiently use the available money.
After training with 20 years of historical data on all S&P 500 stocks for multiple days, I was eager to test the results. Much to my surprise, the bot didn't make any trades! At first, I thought it was a bug, but after extensive debugging, I realized the issue wasn't my code. Rather, it was my reward logic, and I confirmed this after adjusting the reward logic; the bot finally had a reason to start trading.
Key Lessons
As managers and leaders, we spend much time figuring out what actions to encourage and discourage. We create programs to incentivize desired behaviors and disincentivize unwanted ones. Yet, we don't always consider how people will react. Part of the issue is we assume others have the same desires and expectations as we do. Without those common desires, or if those desires are weak, our programs could lead to unintended consequences.
For example, I had decided trading was how to make money. So I inherently wanted to trade even if it can only eak out a tiny profit. But I didn't translate that desire into a reward for the AI bot to share by adding a reward just to make a trade, regardless of the outcome. Because of this, the bot quickly determined that profitable trading was too hard compared to the "pain" of losses. It logically concluded that not trading was the best option.
As managers and leaders, we sometimes assume our teams share the same goals and motivations as we do, like climbing the corporate ladder or building their professional profile. We also assume they value these things at the same level as us. However, as the AI shows, we can't assume others want or value the same things; in fact, they could value the exact opposite.
While I can change my AI bot's motivation by altering its code, we can't do that with people. So, before creating incentives, we should understand what truly motivates our teams to create something that is properly tailored to them, and we should remove our biases when analyzing programs to avoid unintended consequences.
Let me know if you have any other questions! I'm always happy to discuss more about aligning incentives and motivation.


