Reinforcement Learning Sandbox
All goal seeking can be well thought of as the maximization of a single, scalar, externally received signal… As part of achieving that we pose many sub-problems … that are really useful for us to solve as part of achieving the overall goal.
The amazing thing about the reward hypothesis is that there’s one, tiny, little scalar juice that you’re trying to maximize… Out of that low level thing… comes, “I want to raise a family”, “I want to have a career where I’m a successful research scientist…”