13.08.2022

Last edited 16.08.2022

Today I'd like to share with you the problem which has a reputation of being the hardest ever set on the International Mathematical Olympiad.

That's sounds like some serious business! And the goal here is not simply for me to present *a* solution, but to walk you through 3 increasingly-brilliant insights that aid the proof, and in the end hopefully make you a better problem-solver, armed with some new super tools of sorts. So are you up for the challenge?

#### The Story

It's... about problem 6 from the 1988 IMO. I got obsessed over it a few days ago when Numberphile uploaded a video, featuring Zvezdelina Stankova, the only girl to solve the problem at the competition, who shares Emanouil Atanassov's solution, for which he won a special brilliancy award.

By the way, both Zvezdelina and Emanouil are Bulgarians, so this resonates with me on even one more dimension.Watching the original video first is not mandatory, since here I'll focus more on the problem-solving strategies and the non-trivial steps taken in the proof. I will also give an overview of the solution Zvezda talks about, although the focus in this article is still going to be more on the stuff that's applicable outside this specific problem!

#### The Statement

Let $a$ and $b$ be natural numbers, such that $1 + ab$ divides $a^2 + b^2$.

Prove that

$$\frac{a^2 +b^2}{1 + ab}$$

is a perfect square [*].

Note: a perfect square is a number in the form $1^2, 2^2, 3^2, ..., n^2$.

It's genuinely a hard number theory problem. Just to be clear, it asks about specific numbers $a$ and $b$ which, when plugged in the equation, equal a whole number. Now it's easy to see that's not always the case. In fact, most of time the result is going to be a fraction. The problem says that we don't care about those cases ("Throw them to the dogs!", as a Math teacher used to tell us).

$$\frac{6^2 +3^2}{1 + 6*3} = \frac{693}{19}$$

No good, we don't bother further with this.The ratio seems so important that we naturally might give it a letter, for example $r$. What is not obvious at all is why does $r$ (when it happens to be a whole number) necessarily has to be a perfect square?

Now, you are not at the IMO, there's no time pressure. You are at home maybe. Perhaps, grab out a pen and a piece of paper to follow along.

...

Where to start?

To reassure yourself the problem even makes sense, imagine trying to find a triplet of $a$, $b$ and $r$ which satisfies the condition.

You might even write a quick program so the computer does the hard work for you.

```
import math
for a in range(1, 200):
for b in range(1, 200):
r = (a**2 + b**2) / (1 + a * b)
if r.is_integer():
r = int(r)
#print(f"Found a = {a}, b = {b}, r = {r}")
```

Try to remove the '#' from the print statement in the code box above. It's interactive!Eventually you might stumble upon a solution like $(a, b) = (112, 30),\ r = 4$.

When doing Math sometimes numbers are completely left-out! So it's always nice to have a concrete example on the side when working on abstract problems such as this one.

Now back to the expression. If you look closely, it might *look back at you* with a sort of beautiful symmetry: it doesn't matter if $a$ is $a$ and $b$ is $b$. They can switch places!

$$r = \frac{a^2 + b^2}{1 + ab} = \frac{b^2 + a^2}{1 + ba}$$

Before and after - it's the same statement. So that implies that any solution we find also has a twin. So from the earlier example $(a, b) = (30, 112),\ r = 4$ is also a solution.

#### Constraints

There are **3** unknowns here, so you might get stumbled and think - there's nothing much to do with rearranging **one** equation without having more information beforehand - without having something to constrain it.

And that's a very good intuition! A very good rule of thumb in Math is that if you have 3 variables, to solve for them, you need 3 equations. You can think of it as if one expression "locks" in a relationship between one of the variables and the rest.

I remember in high-school when there was a geometry problem which also required algebra I would sometimes get stuck rearranging equations, finding the same ratios over and over again, because I was trying to solve for e.g. 3 variables, by having only 2 equations (or two geometrical constraints).

Here is a system of equations to illustrate the point:

$$ \left\{\begin{array}{ll}3x^2+y+z = 10\\\frac{1}{1+e^{-x}}=3y\\\frac{z}{x+y}=2\end{array}\right.$$

... Don't worry, we are not going to solve this monster. That's not important here. But take a look at the third line, we can instantly get $z = 2(x+y)$. If we substitute our newfound information about $z$ in the first line, we'll get something like: $3x^2 + y + 2(x+y) = 10$. We've eliminated one variable! Bingo!

Now we can solve for either $x$ or $y$ (eliminating one more variable), and then use the second line to finally arrive at a solution. After that we back-track to all the relationships till we get a triplet $(x, y, z)$, which is a solution to the whole system. We might get 0 solutions, infinitely many, or a couple, but the method is the same.

Not every equation can have one of variables isolated on one side-though, so this can sometimes fail, just be careful!#### ...constants, parameters and variables?

This however, is not the route we're going to take with the IMO problem.

You might've learned about letters in Math - how they can represent constants (like $\pi$, $\phi$, ...), parameters ($a$, $b$, $c$, ...) or variables ($x$, $y$, $z$, ...) but these are just conventions, which can be shifted. Nothing stops us from using $x$ as a constant and $\phi$ as a variable, for example.

Moreover, how "variable" something is - is entirely context dependent! We are used to thinking variables change "more often" than parameters which by themselves change more often than constants. You can think of it in layers.

There is a section in "Gödel, Escher, Bach" by Douglas Hofstadter where the author talks about the layers of stability, which inspired me to even touch on this topic here:

"We build up our mental representation of a situation layer by layer. The lowest layer establishes the deepest aspect of the context - sometimes being so low that it cannot vary at all. For instance, the three-dimensionality of our world is so ingrained that most of us never would imagine letting it slip mentally. It is a constant constant." [*]

To illustrate, let's see the following graph:

This is a plot of $y=ax^2+x-1$. The horizontal axis is the $x$ axis and the vertical one is $y$. The graph is the result as we let $x$ "vary" over the real number line from $-\infty$ to $+\infty$. So it kinda makes sense that the parameter $a$ is fixed (imagine the mess if for each different number on the $x$ axis, a different $a$ was chosen!). But if you play with it (tune the slider!), $a$ is also "variable" in the sense that, for different values of it, a different problem all together is graphed. $a$ is "variable" on another level, a higher-level, and it sets the context in which $x$ has meaning. The constant $-1$ in the equation is even more fixed than $a$ - almost nobody would even think about changing the value of "$-1$". (This shouldn't stop you from trying, though!)I hope you've seen how "wavy" and almost artificial the definition of parameters and variables is. Most people out off from high-school thinking that just because something is called $x$, it must be a variable, $a$ must be a parameter (whatever that means).

In the context of Machine Learning we can even talk about hyper-parameters, which are another level "up" the hierarchy of "variability". But that's just a note that there can be more than three levels (there can be even hyper-hyper-parameters).

#### First Insight: Promote $r$ to a parameter

As Zvezda says in the video, doing this is something non-trivial. You don't just casually go around in problems, fixing one of the variables to a parameter. But in this case it just so happens this is a really productive way to crack the problem.

For example's sake, let's fix it as $4$.

$$\frac{a^2 +b^2}{1 + ab} = r\ \ (r = 4)$$

We'll still use $r$ as a letter, in the sense that it's not really fixed, but in the back of your head you can just say"$r$ is $4$" for when you want to jump down to the variable level.What we've done is essentially redefine the problem in terms of $r$. It now makes sense to solve for $(a, b)$ only when $r$ is already fixed to something. It's a subtle difference, but now we've split the original problem into infinitely many sub-problems (each value of $r$ may spur up a family of solutions $(a, b)$).

Now we have an equation with 2 unknowns (better than 3!). If you think about for a bit, this means that we can find a relation between $a$ and $b$. And that's exactly how the proof goes on!

$$\frac{a^2 +b^2}{1 + ab} = r $$

$$a^2 +b^2 = r(1 + ab) $$

$$a^2 +b^2 = r + rab $$

Since $a$ and $b$ are symmetric, we can choose either:

$$a^2 - (rb)a + b^2 - r = 0 $$

... and finally we get a quadratic equation (in terms of $a$). To hopefully bring it back to more familiar land, Zvezda substitutes $x$ for $a$:

$$x^2 - (rb)x + b^2 - r = 0 $$

You can jump in now and use the quadratic formula to solve for $x_1$ and $x_2$ (we know there are exactly 2 solutions), but there is a shortcut: Vieta's formulas.

$$x_1 + x_2 = -\frac{b}{a},\ x_1x_2 = \frac{c}{a}$$

Here $a$, $b$ and $c$ are the coefficients of the quadratic equation. You can prove these formulas yourself (using the quadratic formula), but since they are pretty well-known (and usually taught in Math class) the proof uses them directly as given.In our example:

$$x_1 + x_2 = rb$$

$$x_2 = rb - x_1$$

But what this says is that if we substitute $rb - x_1$ (remember $x$ = $a$) in the original:

$$\frac{x^2 + b^2}{1 + xb} = r $$

We'd also get a solution!

$$\frac{(x_2)^2 + b^2}{1 + x_2b} = \frac{(rb - x_1)^2 + b^2}{1 + (rb - x_1)b} = r $$

The top side of the fraction is always positive, $r$ is positive, so the bottom must be as well. What this leaves is with is that $1 + x_2b$ must be positive as well. Or that $x_2$ is at most 0.

Hold on. What did we just do? We used the original fraction, transformed it into a quadratic equation, found a twin solution $x_2$ for the original $a$, and we've shown it must be nonnegative. It must also be an integer! Why? (Can you think of a reasoning for homework? It's again - by using the original fraction.)

Overall, if we have a pair $(a, b)$ for which we know is a solution. Then we've shown that $(rb - a, b)$ must also be a solution, where $rb - a$ can potentially be zero (and that's what we'll tackle now!).

#### Second Insight: Can 0 be a solution?

If you were especially thoughtful in the beginning when searching for a pair $(a, b)$ for which $r$ is a perfect square, you might have wondered, what if one of the numbers is zero?

$$\frac{a^2 + 0^2}{1 + a*0} = r $$

Then we get directly:

$$\frac{a^2}{1} = a^2 = r $$

So $r$ indeed is a perfect square for any pair $(a, 0)$ (or $(0, b)$ as we saw - they are symmetric).

But the question statement is very stubborn in that regard - it specifically says $a$ and $b$ must be natural numbers.

Sometimes natural numbers are defined to include 0. In Bulgaria that's not usually the case. Actually, in the Olympiad statement the term "positive integers" is used to avoid confusion.So $0$ doesn't seem to work. But... do we care? If we prove that it works for all non-negative numbers ($0$, $1$, $2$, $3$...), then logically it follows that it's also proven for just positives ($1$, $2$, $3$...). You might think "isn't it more difficult to prove a more general statement, though?" And more often than not - that's the case! However, this is one of the rare cases when it turns out to be easier to prove more, than to prove less.

Perhaps the authors of the question have anticipated this and it turn deliberately made the question more difficult by stating it in a more specific manner.How? Let's see.

The analysis in the previous section about the twin roots: $a_2 = rb - a_1$, can be applied to $b$ as well! (Remember that the original equation is symmetric in terms of $a$ and $b$). This means that $b_2 = ra - b_1$.

What if we start with a pair like $(30, 8)$. This is a solution and gives $r = 4$. By applying the formula we find: $a_2 = 4 * 8 - 30 = 2$. So our new pair (which is also a solution) is $(2, 8)$. Look at that! It's simpler. But is that always the case?

What if we started with $b$: $b_2 = 4 * 30 - 8 = 112$. So new pair is $(30, 112)$. Hmm.. can you see the pattern? Play with it more if you want you.

Seems like if we start with the higher number, we get a more complicated result. So... easy, just always pick the lower number!

By this logic $(2, 8)$ then becomes $b_2 = 4 * 2 - 8 = 0$... or $(2, 0)$. AH! A zero. This is forbidden, or is it? Do you remember what we said about proving something more general in order to prove something more specific?

In this case we got a whole family of solutions: $...$, $(30, 112)$, $(30, 8)$, $(2, 8)$, and $(2, 0)$. And you can continue this to infinity by going up - choosing the larger number! AND you can flip $a$ and $b$ to get.. double more infinitely many solutions.

The green plot is*all*of the solutions to the IMO statement (even non-integer) for a fixed $r = 4$. It's symmetric because $a$ and $b$ are interchangable (where the axes here represent $a$ and $b$). With blue points I've marked the first two integer solutions. You can see how all solution pairs sort of "jump" between each other (remember how we transformed the expression earlier into a quadratic equation with two solutions). In general it appears that you can continue jumping by choosing either $a$ and $b$ - there are always two ways to go - except when you are at $0$.

#### Releasing $r$

But... the implications here are even larger. Let's go in the other direction - start by not knowing $r$. We can start with ANY pair $(a, b)$, and by choosing the lower number *it seems like* we will eventually reach $0$. When we do, the equation becomes in the form:

$$\frac{a_n^2 + 0^2}{1 + a_n*0} = a_n^2 = r $$

or

$$\frac{b_n^2 + 0^2}{1 + b_n*0} = b_n^2 = r $$

Where $n$ is an arbitrary however-many-it-takes number of steps. And here, at the final step - we can know $r$. If you want later you can go back up the stairs (remember $r$ stays fixed for one family of ladder solutions) and see the original randomly chosen pair $(a, b)$ must also give a ratio which is a perfect square!

And voila! It's a pretty convincing reasoning which finally proves problem 6. Not really technical (which we'll fix shortly). But intuitive, I hope.

You might wonder what will happen if after taking a step, one of the numbers goes into the negative realm - then everything will break and we'll never reach zero. But worry not - it's actually impossible for a number to go negative after "flip-flopping". It will eventually at most go to $0$. Why? We answered that earlier in the section where we found the formula for $a_2$ in the first place. It turned out that the twin root must be nonnegative - in order to conform with the original expression. If it's necessary, go up and think through it again to reassure yourself that's the case.

Also go back to appreciate what we did with $r$. We fixed it as a parameter in order to be able to look at concrete fixed "sub-problem", but still let ourselves a backdoor of sorts - now, when the time came, we reasoned about $r$ as if we didn't know it's value at all!

#### Third Insight: The Power of Induction ✨

The elephant in the room of the first paragraph of the previous section is: "...*it seems like* you will eventually reach $0$". Well, when doing technical proofs "it seems like" is not really enough. We have to formalize it. Here we'll take time to think about how we use *induction* to wrap the solution into a more rigorous package.

First of all, what even is induction? It's logical in the sense that - it's *difficult* to disagree with it, but when applied to math it almost feels like magic. Let's look at the second case first to illustrate.

Take a look at this classic proof for the sum of the first $n$ integers. The axiom of induction has 3 parts - a base step, an inductive hypothesis, and an inductive step which triggers like a chain to prove (potentially infinitely) many statements.

Firstly, it's important to note that induction won't give us the answer. We must already know the answer before-hand. For the sum it *looks like*:

$$1 + 2 + 3 + ... + n = \frac{1}{2}n(n+1)$$

Shall we try?

```
N = 0
sum = 0
# We must use N + 1 here,
# because "range" is not inclusive!
for i in range(N + 1):
sum += i
formula = N * (N + 1) // 2
print(f"Sum is: {sum}")
print(f"The formula gives us: {formula}")
```

Again, it's interactive! Try to change "N" from the first line to different values to see what happens.But you can go up and up and up... to arbitrarily large $N$... how do you know it doesn't break for $N+1$. It could suddenly stop working. Lot's of formulas in Math seem to be true until some point after which they diverge from the actual answer. In general eye-balling - as in it works for big enough $N$, is not enough.

The proof starts like this. The base step is:

$$(1)\ It\ works\ for\ n=1.$$

$$1 = \frac{1}{2}1(1 + 1)$$

Sure enough. We'd see why we need this. The second step is to *assume* it works up until an integer $k$:

$$(2)\ It\ works\ for\ some\ k.$$

$$1 + 2 + 3 + ... + k = \frac{1}{2}k(k + 1)$$

Now, for the final step, show that it's true for $k + 1$.

$$(3)\ 1 + 2 + 3 + ... + k + (k + 1) =$$

$$= \frac{1}{2}k(k + 1) + (k + 1) = $$

$$= (k+1)(\frac{1}{2}k + 1) =$$

$$= (k+1)(\frac{k + 2}{2}) = $$

$$= \frac{1}{2}(k+1)(k+2)$$

The final term is exactly the original formula we wanted to prove $\frac{1}{2}n(n+1)$ if $n= k + 1$. So by the principle of mathematical induction the statement is true for all natural numbers $1, 2, 3, ...$

Aren't we assuming something more here, though? Take a look step 2. It seems like we're just saying - assume the formula is true therefore the formula is true. But it's a bit subtle. We said *assume* it works **up to** an integer $k$. This $k$ can be $1$ - or any $N$ we found using the Python program. That's why we need the base step - to show it's true for just one integer. Now that we wrapped up our empirical evidence in $k$, in the final step we manipulate the expression in order to show that it must work for $k+1$. This triggers a chain of proofs and eventually catches all natural numbers. Therefore the formula always works.

A good analogy for induction is by using natural language:

An evil wizard has cursed a kingdom with a spell that makes it rain if it has rained on the previous day. Today it rains. Will it rain forever? The logical answer seems to be "yes". There will always a previous day on which it has rained. Today it rains and this is the "base step" which triggers like a chain for all days in the future.

It's interesting to see why induction must be included as an axiom in Math. It turns out it cannot be proven from other axioms, or something more: some statements are *unprovable* without induction.

Let's see another example from "Gödel, Escher, Bach" [*]. It turns out the following statement is undecidable in the regular number theory (without induction):

$$For\ all\ n:\ (0 + n) = n$$

You can individually prove each statement by itself:

$$(0 + 0) = 0$$

$$(0 + 1) = 1$$

$$(0 + 2) = 2$$

$$(0 + 3) = 3$$

$$ ... $$

Each of these has a rigorous, step-by-step proof by the base axioms of Mathematics (for brevity we'll not discuss them here in detail, you can look up Peano's axioms to get a head-start!), but the whole family is undecidable in the sense that we *simply don't know* it's not going to break for example at $0 + 1298592589 = 1298592589$. The key here is that we can find a pattern in the proof from one statement to the next (like we did with the inductive step).

Of course you can choose not to believe in induction. Then you can simply add:

$$For\ all\ n:\ (0 + n) = n$$

as an axiom to your theory. However, there is something to be said about the operation of addition - shouldn't it be captured fully in the theory? This feels like artificially plugging in a statement to fix the holes. Can you guarantee there are no other holes? Induction seems like a framework that's more powerful (while still being weirdly easy to grasp). (Of course, even with induction, there are still unprovable statements in Math, take a look at Gödel's incompleteness theorems).

... But let's stay you are stubborn and simply refuse to agree with induction. There's nobody to stop you! And that's good, many fields in math are found by simply breaking the rules. Addition and multiplication are commutative ($a + b = b + a$ and $ab = ba$) in most number systems, but not always... Yes, it seems strange! But take a look at matrix multiplication. One geometrical interpration of matrices is that they represent transformations, and multiplying them is like chaining transformations one after the other (the result of the multiplication is a combined matrix of both operations). Try imagining two rotations on Rubick's by 90 degrees on two axis, e.g. $x$ and $y$ (e.g. with matrices $R_x$ and $R_y$). You'll get a different result depending on which one you decide to do first! In a sense $R_x * R_y \neq R_y * R_x$.

There are thinkable worlds in which more and more basic logic rules are broken. For example, a popular logical tautology is that either $p$ or $\sim p$ (not $p$) is true. There is no middle ground - either a logical statement is true or false. Makes sense... But you can decide to stop "believing" in that - who knows what you'll find?! (One interpretation of Quantum mechanics is that statements about the future can be neither true nor false, so there's that.)

We can continue this discussion until infinity. The point here to take is that what you "believe" as true directly puts you in a frame, which tells you what you can and what you cannot deduce in your system. Don't be afraid to break the system from time to time. But for now...

... let's just go back to our land of familiar Math. One that is logical and seems to makes sense for most of our daily lives. Where addition and multiplication is commutative, induction is intuitive. We still have an Olympiad problem to wrap up!

#### Wrapping up our proof

There is another variation of induction - finite induction. In our case we have to show that eventually in a pair $(a, b)$ one of the terms will go down to zero (without loss of generality it doesn't matter which, it's symmetric).

The only missing link in our proof is that $(a, b)$ with it's corresponding formulas to "flip-flop": $a_2 = rb - a_1$, $b_2 = ra - b_1$, doesn't always guarantee that we'll go down. What if both $a$ and $b$ go up? Can that happen? We don't know - it didn't happen in our example, but how do we know it doesn't break at a specifically chosen $(a, b)$.

In general the only thing missing is the inductive step. Show that for $(a, b)$ either $(a, b_2)$ or $(a_2, b)$ "went down". Can you prove this for homework? (Hint: use proof by contradiction and Vieta's formula about the product of the two roots... or you can also look at how Zvezda proves it in the Numberphile video).

In any case, once this is settled, we're essentially done. For any whole numbered pair $(a, b)$, we've shown a method to bring down one of the terms to zero (while at every step we have a valid solution to the equation!) and in the end, the other term squared must be equal to $r$. Which implies that $r$ is, indeed, a perfect square, always.

Phew.

#### Conclusion

I hope I've shown you how 3 clever insights can work together to create a beautiful solution to the question which still holds reputation of being one of the hardest ever set on the IMO. It's one of few times competitive Math has sparked research into a topic - now known as proof by Vieta jumping", if you want to explore it more.

The key takeaway here is not the solution to the problem itself. What I found most interesting is how counter-intuitive are some of the steps taken. I wanted to share in details something of their nature, and who knows - maybe these will be incredibly useful to a problem you and I try to solve the future.

By thinking about it conceptually I'd bet you can apply them to real life as well. (Who says Math is all amount scribbles on paper!)

Anyways, hope you enjoyed, bye for now <3.

#### Some references which made this possible

[1] Original Numberphile video which inspired this post (also found at the beginning). Zvezdelina Stankova talks about the proof presented here.

[2] Another Numberphile video - an extra video about induction with Zvezda.

[3] "Gödel, Escher, Bach" by Douglas Hofstadter, Typographical Number Theory, p. 231, example about the family of undecidable theorems, discussion of induction, Peano's axioms.

[4] "Gödel, Escher, Bach" by Douglas Hofstadter, Artificial Intelligence: Prospects, p. 640, part about Layers of Stability and discussion about constants, parameters and variables.