A DSL for peephole transformation rules of integer operations in the PyPy JIT

108 points by todsacerdoti 4 days ago

fweimer 4 days ago

GCC does this match.pd. Here's the add_zero example, I think:

    /* Simplifications of operations with one constant operand and
       simplifications to constants or single values.  */
    
    (for op (plus pointer_plus minus bit_ior bit_xor)
      (simplify
        (op @0 integer_zerop)
        (non_lvalue @0)))

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/match.pd

The floating point case can be more complicated because of signed zeros and signaling NaNs, see fold_real_zero_addition_p.

westurner 4 days ago

math.fma fused multiply add is in Python 3.13. Are there already rules to transform expressions to math.fma?

And does Z3 verification indicate differences in output due to minimizing float-rounding error?

https://docs.python.org/3/library/math.html#math.fma :

> math.fma(x, y, z)

> Fused multiply-add operation. Return (x * y) + z, computed as though with infinite precision and range followed by a single round to the float format. This operation often provides better accuracy than the direct expression (x * y) + z.

> This function follows the specification of the fusedMultiplyAdd operation described in the IEEE 754 standard.

cfbolztereick 4 days ago

I only support integer operations in the DSL so far.
(but yes, turning the python expression x*y+z into an fma call would not be a legal optimization for the jit anyway. And Z3 would rightfully complain. The results must be bitwise identical before and after optimization)
- westurner 2 days ago
  
  Does Z3 distinguish between x*y+z and z+y*x, in terms of floating point output?
  math.fma() would be a different function call with the same or similar output.
  (deal-solver is a tool for verifying formal implementations in Python with Z3.)

hoten 4 days ago

The before/after delta is very satisfying and clearly easier to work with. Very nice.

How does the DSL handle side effects? I'm assuming any term here could be a compound expression, so what part of the system prevents "0*callfn()" from reducing to "0"?

bastawhiz 4 days ago

I believe this operates on values in something akin to registers, not expression trees. By the time you get to a place where one of these rules is relevant, the side effect has already taken place. This just prevents the return of such a function call to be operated on, like in your example.
- cfbolztereick 4 days ago
  
  Yeah, that's exactly right. It's operating on an SSA-based intermediate representation. So if an operation gets removed, the arguments are still being computed. The earlier operations that produce those arguments can also be removed by dead code elimination (which runs later), but indeed only if they have no side effects.
  I suppose the notation of the rule heads makes it look like there actually are expression trees, which is maybe sort of confusing.

JonChesterfield 4 days ago

A cost model would give a proof of termination - require the output of each transform to be cheaper than the input. Confluence is less obvious.

mattalex 3 days ago

Once you have strong normalization you can just check local confluence and use Newman's lemma to get strong confluence. That should be pretty easy: just build all n^2 pairs and run them to termination (which you have proven before). If those pairs are confluent, so is the full rewriting scheme.
- JonChesterfield 2 days ago
  
  That is a new one to me. Tracked the reference back to https://www.jstor.org/stable/1968867 which looks excellent. Thank you!

mjcohen 4 days ago

Interesting to see which optimizations are never used.

marticer1 4 days ago

[dead]