Skip to main content

r/PhilosophyofMath


Notes from a Bayesian Existence
Notes from a Bayesian Existence

Every day feels deterministic in hindsight and probabilistic in advance. When I look back at my life, everything appears inevitable. Every friendship, every mistake, every departure, every moment of joy seems connected by an invisible thread. The present turns the chaos of the past into a story and stories always make events appear as though they were destined to happen. But that is not how life feels while living it. Standing in the present is like standing before a branching tree of possibilities. Every choice carries uncertainty. Every conversation could become a lifelong bond or a forgotten memory. Every risk could end in success or failure. We move forward without knowing which path reality will select. We never possess complete information. We begin with assumptions, gather experiences, update our beliefs and continue onward. Every disappointment alters our expectations. Every act of kindness changes our model of the world. Every loss and every triumph become new data points from which we attempt to predict an unknowable future. The strange part is that certainty only arrives after the fact. Once an outcome occurs, the countless alternatives disappear from view. We forget the uncertainty that once surrounded the moment and convince ourselves that things could never have unfolded differently. Perhaps wisdom is remembering that they could have.
The future is not a solved equation waiting to be revealed. It is a probability distribution continually collapsing into reality, one moment at a time. And all any of us can do is keep updating our understanding with the incomplete data we are given.
We are, in the end, imperfect statisticians trying to make sense of an unfinished universe.

upvote

Advertisement: A higher dose is now available. Talk to your prescriber about Wegovy® HD.
A higher dose is now available. Talk to your prescriber about Wegovy® HD.

See the following links for: Medication Guide & Safety Information

media poster


LLMs are just giant probability machines pretending to think
LLMs are just giant probability machines pretending to think

It’s fascinating that simple mathematics between tokens can eventually become a machine that writes essays, code, poetry, and even reasoning.

We usually think probability means uncertainty.

But LLMs show something strange:

If probability + context + mathematical matching are scaled enough, uncertainty itself starts producing intelligent looking outputs.

To understand this better, I tried breaking down an LLM from first principles using only 4 tiny training sentences.

Example:

The boat floated down to the bank.

The investor walked into the bank to open a new account.

The fisherman walked along the bank to cast his net.

The bank has a vault.

Then I asked:

“The investor walked to the bank to lock his money in …”

Why does the model predict “vault” instead of river-related words?

That single question reveals almost the entire architecture of modern LLMs.

The most underrated concept here is the LM Head.

Most explanations immediately jump into transformers and attention, but almost nobody explains that the LM Head is essentially a gigantic token vocabulary containing all possible next token candidates the model can output.

So internally the model is basically solving:

“Out of all known tokens, which one best matches this context mathematically?”

Then different layers help solve that problem:

Embeddings: convert words into mathematical vectors

Positional encoding: preserves word order

Attention layer: figures out which words are related to each other in context

(“investor”, “money”, “bank” become strongly connected)

Feed forward neural networks: act somewhat like massive learned if/else decision systems refining patterns internally

And finally the LM Head converts all of that into probabilities for the next token.

What surprised me most is:

There is no hidden magic moment where the AI “becomes conscious”.

It’s an enormous probability engine continuously finding the best contextual token match from its vocabulary.

I made a beginner-friendly walkthrough explaining this visually without unnecessary jargon.

https://www.youtube.com/watch?v=YTV5qUCpu2c

Would genuinely love feedback from people learning transformers/LLMs from scratch.


The difference between improbable and spectacular paths
The difference between improbable and spectacular paths

you find a scratch-off lottery ticket on your way to work and win €100,000.

You immediately want to cash it in … but lightning strikes the magnetic paper and pulverizes the ticket!

Then nothing extraordinary has actually happened.
I went to work and didn't win a scratch-off ticket, just like any other day.

My question:

  1. How is such a stochastic process modeled?
    Is such an event simply an improbable path to a probable result? If so, does this path differ from others only by a discontinuous expectation of future wealth?

  2. If wealth is expressed as a scalar, than loosing 100.000€ and winning 100.000€ is kommutativ. But subjective it’s clearly not! Is our satisfaction with the expected future economic state a kind of inert system? Like a mass reacting to an external force?