Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Zmij: Faster floating point double-to-string conversion (vitaut.net)

131 points by fanf2 4 days ago | 19 comments

floitsch 14 hours ago [-]

Pretty impressive.

When I published Grisu (Google double-conversion), it was multiple times faster than the existing algorithms. I knew that there was still room for improvement, but I was at most expecting a factor 2 or so. Six times faster is really impressive.

vitaut 13 hours ago [-]

Thank you! It means a lot coming from you, Grisu was the first algorithm that I implemented =). (I am the author of the blog post.)

amluto 6 hours ago [-]

I read the post and the companion post:

https://vitaut.net/posts/2025/smallest-dtoa/

And there’s one detail I found confusing. Suppose I go through the steps to find the rounding interval and determine that k=-3, so there is at most one integer multiple of 10^-3 in the interval (and at least one multiple of 10^-4). For the sake of argument, let’s say that -3 worked: m·10^-3 is in the interval.

Then, if m is not a multiple of 10, I believe that m·10^-3 is the right answer. But what if m is a multiple of 10? Then the result will be exactly equal, numerically, to the correct answer, but it will have trailing zeros. So maybe I get 7.460 instead of 7.46 (I made up this number and I have no idea whether any double exists gives this output.) Even though that 6 is definitely necessary (there is no numerically different value with decimal exponent greater than -3 that rounds correctly), I still want my formatter library to give me the shortest decimal representation of the result.

Is this impossible for some reason? Is there logic hiding in the write function to simplify the answer? Am I missing something?

vitaut 6 hours ago [-]

This is possible and the trailing zeros are indeed removed (with the exponent adjusted accordingly) in the write function. The post mentions removing trailing zeros without going into details but it's a pretty interesting topic and was recently changed to use lzcnt/bsr instead of a lookup table.

HexDecOctBin 10 hours ago [-]

It seems that most research effort goes into better dtoa, and not enough in a better atod. There are probably a dozen dtoa algorithms now, and (I think?) two for atod. Anyone know why?

vitaut 8 hours ago [-]

Good question. I am not familiar with string-to-double algorithms but maybe it's an easier problem? double-to-string is relatively complex, people even doing PhD in this area. There is also some inherent asymmetry: formatting is more common than parsing.

dtolnay 3 hours ago [-]

In implementing Rust's serde_json library, I have dealt with both string-to-double and double-to-string. Of the two, I found string-to-double was more complex.

Unlike formatting, correct parsing involves high precision arithmetic.

Example: the IEEE 754 double closest to the exact value "0.1" is 7205759403792794*2^-56, which has an exact value of A (see below). The next higher IEEE 754 double has an exact value of C (see below). Exactly halfway between these values is B=(A+C)/2.

  A=0.1000000000000000055511151231257827021181583404541015625
  B=0.100000000000000012490009027033011079765856266021728515625
  C=0.10000000000000001942890293094023945741355419158935546875

So for correctness the algorithm needs the ability to distinguish the following extremely close values, because the first is closer to A (must parse to A) whereas the second is closer to C:

  0.1000000000000000124900090270330110797658562660217285156249
  0.1000000000000000124900090270330110797658562660217285156251

The problem of "string-to-double for the special case of strings produced by a good double-to-string algorithm" might be relatively easy compared to double-to-string, but correct string-to-double for arbitrarily big inputs is harder.

nly 2 hours ago [-]

I guess one aspect of it is that in really high performance fields where you're taking in lots of stringy real inputs (FIX messages coming from trading venues for example, containing prices and quantities) you would simply parse directly to a fixed point decimal format, and only accept fixed (not scientific) notation inputs. Except for trailing or leading zeros there is no normalisation to be done.

Parsing a decimal ASCII string to a decimal value already optimizes well, because you can scale each digit by it's power of 10 in parallel and just add up the result.

mdf 4 hours ago [-]

> formatting is more common than parsing.

Is it, though? It's genuinely hard for me to tell.

There's both serialization and deserialization of data sets with, e.g., JSON including floating point numbers, implying formatting and parsing, respectively.

Source code (including unit tests etc.) with hard-coded floating point values is compiled, linted, automatically formatted again and again, implying lots of parsing.

Code I usually work with ingests a lot of floating point numbers, but whatever is calculated is seldom displayed as formatted strings and more often gets plotted on graphs.

NovemberWhiskey 10 hours ago [-]

When I saw the title here, my first thought was “wow, these RISC-V ISA extensions are getting out of hand”

mulle_nat 11 hours ago [-]

Thank you for the code. I could port this easily to C and it solved a lot of portability issues for me.

vitaut 6 hours ago [-]

I've added a C implementation in https://github.com/vitaut/zmij/blob/main/zmij.c in case you are interested.

mulle_nat 7 minutes ago [-]

Nice, but it's too late I needed a different API for future use in my custom sprintf so I made mulle-dtostr (https://github.com/mulle-core/mulle-dtostr). On my machine (AMD) that benchmarked in a quick try quite a bit faster even, but I was just checking that it didn't regress too badly and didn't look at it closer.

andrepd 13 hours ago [-]

Very interesting!

I wonder how Teju Jaguá compares. I don't see it in the C++ benchmark repo you linked and whose graph you included.

I have contributed an implementation in Rust :) https://crates.io/crates/teju it includes benchmarks which compare it vs Ryu and vs Rust's stdlib, and the readme shows a graph with some test cases. It's quite easy to run if you're interested!

vitaut 13 hours ago [-]

I am not sure how it compares but I did use one idea from Cassio's talk on Teju:

> A more interesting improvement comes from a talk by Cassio Neri Fast Conversion From Floating Point Numbers. In Schubfach, we look at four candidate numbers. The first two, of which at most one is in the rounding interval, correspond to a larger decimal exponent. The other two, of which at least one is in the rounding interval, correspond to the smaller exponent. Cassio’s insight is that we can directly construct a single candidate from the upper bound in the first case.

andrepd 12 hours ago [-]

Indeed! I saw that you linked to Neri's work, so you were aware of Teju Jaguá. I might make a pull request to add it to the benchmark repo when I have some time :)

Another nice thing about your post is mentioning the "shell" of the algorithm, that is, actually translating the decimal significand and exponent into a string (as opposed to the "core", turning f * 2^e into f' * 10^e'). A decent chunk of the overall time is spent there, so it's worth optimising it as well.

vlovich123 6 hours ago [-]

Any ideas why Rust’s stdlib hasn’t adopted faster implementations?

pitaj 5 hours ago [-]

I think code size is one notable reason

Cold_Miserable 12 hours ago [-]

Already done ages ago. Nothing more of interest.

The bottleneck are the 3 conditionals: - positive or negative - positive or negative exponent, x > 10.0 - correction for 1.xxxxx * 2^Y => fract(log10(2^Y)) 1.xxxxxxxx > 10.0

Rendered at 10:41:09 GMT+0000 (Coordinated Universal Time) with Vercel.