Love Data Week 2023

A Long and Lasting Love File Format

Mathematicians absolutely love data! A common misconception by people who are not involved with the natural sciences is that mathematics is already "finished". That we already know everything we need and that no more active research is being done. It may therefore be hard to imagine that research data plays a fundamental role in today's mathematical research. But just like physicists perform experiments to analyze properties, mathematicians also gather data about the objects they study. But what does mathematical research data look like? What is it used for? And why is it important? What kind of challenges does one encounter when handling mathematical data?

The reason mathematics was originally developed was to try and understand our universe better. People observed real-life phenomena and developed abstract concepts like fractions, angles, coordinates, etc. in order to be able to formulate problems and think about them.

But as mathematics has grown, so has the complexity of the objects that are being studied. It is therefore quite common that mathematicians try to classify and list properties of the objects they're working with. Think of something like the periodic table in chemistry, but then for mathematical objects!

In the following, we're going to discuss a few mathematical objects (some simple, some more complicated) and discuss how these objects are stored as data.

Platonic Solids

Click Image for Interaction.
Link to open on an external website
Credits: Polyhedra interactive app by Jürgen Richter-Gebert (Science to Touch / TU Munich), part of the Math to Touch collection.

The platonic solids are some of the oldest objects studied in mathematics, dating back to the ancient Greeks. They are extremely symmetrical objects. If we rotate a platonic solid in such a way that one corner matches up with another one the end result will look completely identical. Surprisingly there exist only five of these. We can study them in the above applet. First move the "Shrink facets" slider completely to the left in order to get the actual platonic solids. They are called (in order): tetrahedron, hexahedron (or cube), octahedron, dodecahedron and icosahedron.

(Translated from ancient Greek the names literally mean: four-face, six-face, eight-face, twelve-face and twenty-face)

What to do?
You can view the platonic solids from any angle, change the colors of the platonic solids and use the sliders to see how one can slowly transform platonic solids into other shapes (not necessarily platonic solids)

Find out why there are only five platonic solids

The symmetric properties of a platonic solid strongly determine what possibilities we have. In fact, one can show that a platonic solid is essentially completely determined if we know what a corner of the solid and all the faces connected to that vertex look like.

If we look at the corner of a cube for example, we see that three squares meet each other at that corner.

We will tackle all of the possibilities:

The smallest possible face is an equilateral triangle. We can take three of those and glue them together at a single point. The platonic solid that corresponds to this is a tetrahedron (The red object in the picture):
But what if we took four triangles instead and glued those together at a single point? We would get a pyramid-like shape, and the platonic solid that corresponds to this is the octahedron (the green object in the picture).
Five triangles? Also possible! This would give us the icosahedron (the light blue object in the picture)

But what about six triangles? Each of the angles of an equilateral triangle is 60 degrees. This means that if we put 6 of them together we would get 360 degrees. So gluing six triangles together would give us a flat 2D plane. Which makes it impossible for us to get a 3D shape. (This is illustrated in the picture below. Imagine that you're trying to craft the platonic solids out of cardboard.)

We need to move on to the next shape:

We now take a square as our face. Putting three squares together gives us a cube (the orange object in the picture).

Four squares give us the same problem as before 4 x 90 degrees = 360 degrees and we can't build a platonic solid in this way.

We now take a pentagon as our face. Putting three pentagons together gives us our final platonic solid: The dodecahedron (the dark blue object in the picture).

Why do we not have any more? Four squares already gave us issues, so it was to be expected that four pentagons also wouldn't work. But what about three hexagons? Any hexagon has angles of 120 degrees and 3 x 120 = 360 degrees making this impossible as well. Any faces with a higher number of angles will obviously face the same issue as the size of the angles will only increase.

So a platonic solid is essentially determined by the shape of its faces (how many corners does a face have?) and how many faces come together at a vertex of the platonic solid.

We have thus classified all possible platonic solids by a pair of two integers:

p: The number of angles of the face
q: The number of faces coming together at a vertex of the 3D shape.

This can be used as a way to store the data of the platonic solid!

This pair of integers {p, q} is commonly called the Schläfli symbol. If we allow other polyhedra that are not platonic solids, other values for p and q are also possible. See e.g. : https://en.wikipedia.org/wiki/Regular_polyhedron or https://en.wikipedia.org/wiki/Schläfli_symbol

Elliptic Curves

Click Image for Interaction.
Link to open on an external website
Credits: Simple interactive prototype by IMAGINARY.

An example of a more complicated mathematical object that is being studied extensively is that of an elliptic curve. Elliptic curves are curves that are given by equations of the form:

y^2 = x^3 + a * x +b

The curves of this form have unique properties that no other type of curve possesses. This allows one to define some kind of addition using points on the curve. This addition law makes the curve very attractive to cryptographers. Nowadays elliptic curve cryptography is basically used everywhere. Chances are high that the browser you're currently using to view this website is using an elliptic curve in the background to ensure your connection is secure.

What to do?
You can play around with the parameters a and b and see how the graph of the elliptic curve changes for different values.

Learn how to see if your browser is using elliptic curve cryptography

Mozilla Firefox:
Pick your favorite website and click on the pad lock in the top left corner. Something similar to the following should show up:

Click on Connection secure to get:

And now click on "More information". You should get the following screen:

Clicking on View Certificate should lead you to a screen telling you everything about the security of the website you are on:

Google Chrome:
Pick your favorite website and click on the pad lock in the top left corner. Something similar to the following should show up:

Click on "Connection is secure" to get:

Click on "Certificate is valid" to see the following screen on which you can find the information in the Details tab:

Read more about file formats and a database of elliptic curves

A database containing elliptic curves and file formats
As elliptic curves play an essential role in number theory (They were used in the proof of Fermat's Last Theorem for example.) and cryptography, mathematicians have created a database storing as much information about elliptic curves and related objects as possible.

This database is called the LMFDB:

https://www.lmfdb.org/

A more general equation of an elliptic curve is one of the form:

y^2 +a1*y + a3*x*y = x^3 + a2* x^2 +a4*x + a6

which is the one that is also being used in the database. Take a look at the curve

https://www.lmfdb.org/EllipticCurve/Q/30/a/6

for example. One sees that a lot of additional data is being stored that keeps track of properties of the elliptic curve. Having this database allows researchers to find elliptic curves with very specific properties or study their behavior by looking at families of them. This can then, for example, successfully be used to find weaknesses in cryptographic protocols and patch those weaknesses to further increase the security of web based interactions!

The following link shows how the data is being stored in the database:

https://www.lmfdb.org/EllipticCurve/Q/data/30.a6

It's a lot more complicated than a simple list of numbers!

Algebraic Surfaces

Click Image for Interaction.
Link to open on an external website: love.imaginary.org
Credits: Raytracer by Aaron Montag, created using CindyJS by Jürgen Richter-Gebert.

What to do?
You can play with a polynomial equation in 3 variables and in real-time see the (real) algebraic surface defined by the zero set of the polynomial! The bars on the right are parameters a and b between 0 and 1, which you can use in the equation and the zoom level. Whenever you change the equation, click on "Redraw" to see the new surface.

The created images are used in research papers and also for math outreach to show the beauty of and the love for maths. The heart is given by its equation. You can change the last cube of the equation into a square and see how the "heart falls into your trousers". This is a German proverb "Das Herz rutscht in die Hose", in our case it is an "Unterhose" (slip).

Find more examples of equations and insights into the file format issues for these images/surfaces.

More equations to play with
Here are some examples of other equations you can try:

x^6+y^6+z^6-1
y*z*(x^2+y-z)
x^2+y^2-z^2+a-0.5
x^2+y^2+z^2+2*x*y*z-1
x^2-x^3+y^2+y^4+z^3-z^4
1.2*x^2+1.2*z^2-5*(y+0.5)^3*(0.5-y)^3
(x^2+y^2+z^2-1)*((x-3*a)^2+y^2+z^2-1)
4*((a*(1+sqrt(5))/2)^2*x^2-y^2)*((a*(1+sqrt(5))/2)^2*y^2-z^2)*((a*(1+sqrt(5))/2)^2*z^2-x^2)-1*(1+2*(a*(1+sqrt(5))/2))*(x^2+y^2+z^2-1)^2

Images and file formats
How would you save/archive/store the images? It could be a 2d image, or - better and smaller and more flexible- the equation that creates the image? But the equation alone does not define the whole image - you would need properties such as the rotation in space, the light model (colors?), the zoom level, etc. And maybe also the root finder and info on the parser of the equation?

File formats and software
To summarize, we would need a good file format, that works in the given software. Then we can recreate the image in our resolution and even change it and re-use it in a flexible way! Only, the software used here in the widget is an openly licensed raytracer written in CindyJS, where no file format for the equations exists so far. There is an older also openly licensed software called SURFER, that has become very difficult to further update, since a main user interface library is not maintained anymore. It is written in Java and has a .jsurf file format for its "surfaces", see here for an example. It uses the standard Java properties format, but has no proper documentation. Then, there is an even older, fully out-of-maintenance, software called Surf, written in C++. It has a non-standardized .pic file format, see here for an example. This one has a documentation and includes also more details as the used root finder.

Problems with data infrastructure and file formats

With the rise of computers, we have seen a large increase in the size and the complexity of the data mathematicians use. In the past, we only had books like the atlas of finite groups in which some data was written down, but nowadays a database like the LMFDB, for example, contains multiple Terabytes of data.

And more data gets produced every day! Papers with new results get published at a very high speed and we currently simply do not have the infrastructure available for supporting the huge influx of data that comes with it. There are several aspects to consider here:

If we make a database of objects of a certain type available, other researchers will use this data to produce new results. But we want to make sure that this classification is complete, i.e. that we have all of the objects in our database (like we hopefully convinced you there are only 5 platonic solids) and that the data is correct. But how can we ensure this? Sometimes computations can take months or years to complete. And it's very easy to make a small mistake in the code written to perform the computations. Especially if this code is very complicated.
And what if the data was produced using closed-source software? If I can look at the code I can convince myself that the way the data was produced makes sense mathematically. But if I can't... anything could have happened behind the scenes. The authors could've written the code in such a way that it outputs exactly the results they want to see as output in the cases they describe in their paper and simply spit out nonsense any other time.
We want other researchers to use (or reuse) the data that has been computed. But will they actually be able to find it? With the huge influx of papers, it is sometimes difficult to get an overview of everything that is out there. Additionally, authors of papers are not always aware of best practices concerning the publication of data. A common scenario is the following: An author puts the data on their own website. They move to a different university after a few years. The website gets deleted. The data is gone forever and other people will need to redo the computations.
Who is going to pay for storing computed data? If the data is just a couple of MB it's probably not too hard to find a place to put it, but what if it is several Gigabytes?
In other areas of sciences, most data will consist of lists of numbers. Mathematical objects however can get very complicated very quickly and one therefore has to think carefully about how to store it. Especially if you want other people to be able to load the objects again in their favorite software package (which might not be the one you used for your computations). Currently, there is no universally agreed-upon standard for this. Think about polynomials, for example. A polynomial in one variable x with rational coefficients is simple enough. But what if the polynomial was defined over a field extension L of QQ? Then we would also need to store the field L. But there is no canonical way to define the field L. We need to choose a specific polynomial defining the field extension L over QQ and store this with the data. But maybe L was defined as a field extension over a field that was not QQ. And then you need to deal with it differently again. And how do you store a polynomial with multiple variables? Do you consider x^2 +x*y to be a polynomial in y over the ring k[x]? Or is it an element of k[x,y]? There are many more questions like this that could be asked and it is not easy to find a file format that is universally accepted by everyone.

Within MaRDI we work on possible solutions for these problems with a special focus on mathematical research data. Please subscribe to our newsletter to keep up to date on our findings!