Discussion:
Performance of non-static method handles
Charles Oliver Nutter
2018-02-02 12:26:30 UTC
Permalink
Hey folks!

I'm running some simple benchmarks for my FOSDEM handles talk and wanted to
reopen discussion about the performance of non-static-final method handles.

In my test, I just try to call a method that adds given argument to a
static long. The numbers for reflection and static final handle are what
I'd expect, with the latter basically being equivalent to a direct call:

Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call

If the handle is coming from an instance field or local variable, however,
performance is only slightly faster than reflection. I assume the only real
improvement in this case is that it doesn't box the long value I pass in.

local var Handle: 2.7ns/call

What can we do to improve the performance of non-static method handle
invocation?

- Charlie
John Rose
2018-02-02 12:33:49 UTC
Permalink
Vladimir Ivanov did some work a few years ago on MH customization for hot MH instances. It’s in the system. That should get better results than what you show. I wonder why it isn’t kicking in. You are using invokeExact right?
Post by Charles Oliver Nutter
Hey folks!
I'm running some simple benchmarks for my FOSDEM handles talk and wanted to reopen discussion about the performance of non-static-final method handles.
Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call
If the handle is coming from an instance field or local variable, however, performance is only slightly faster than reflection. I assume the only real improvement in this case is that it doesn't box the long value I pass in.
local var Handle: 2.7ns/call
What can we do to improve the performance of non-static method handle invocation?
- Charlie
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Vladimir Ivanov
2018-02-02 13:02:15 UTC
Permalink
MH customization doesn't help here. The benchmark measures the cost of
MH type check + MH.invokeBasic() call.

For MH.invokeExact(), type check is ptr comparison of MH.type against
MethodType associated with the call site.

MH.invokeBasic() involves the following steps:
MethodHandle --form-->
LambdaForm --vmentry-->
MemberName --method-->
(ResolvedMemberName --vmtarget--> // since jdk11 [1])
JVM_Method* --_from_compiled_entry-->
entry address

The only optimization I see is to remove LambdaForm step and access
MemberName (ResolvedMemberName since jdk11) directly from MethodHandle.
But there'll be still 3 dereferences involved:
MethodHandle --form-->
[Resolved]MemberName --vmtarget-->
JVM_Method* --_from_compiled_entry-->
entry address

The downside of such removal would be inability to rewrite individual
LambdaForms (e.g., to eliminate redundant class initialization check)
w/o tracking all MethodHandles which use particular LambdaForm.
Probably, we can live without that (especially in JIT-compiled code).

In total, it ends up as 4 indirect loads (3 selection steps + 1 load
from MH.type for type check) and I don't see a way to cut it down further.

For example, MemberName is a sort of handle for JVM internal Method*.
JVM keeps a table of all MemberName instances and iterates over them
when, for example, class redefinition happens. If MemberName indirection
is eliminated, then MethodHandle would point directly to JVM_Method and
JVM has to track all MethodHandle instances instead.

JVM_Method* is required due to similar reasons.

Type check on MH can't be further optimized as well.

So, I'm quite pessimistic about the prospects of speeding up invocations
on non-constant MethodHandles.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8174749
Post by John Rose
Vladimir Ivanov did some work a few years ago on MH customization for hot MH instances. It’s in the system. That should get better results than what you show. I wonder why it isn’t kicking in. You are using invokeExact right?
Post by Charles Oliver Nutter
Hey folks!
I'm running some simple benchmarks for my FOSDEM handles talk and wanted to reopen discussion about the performance of non-static-final method handles.
Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call
If the handle is coming from an instance field or local variable, however, performance is only slightly faster than reflection. I assume the only real improvement in this case is that it doesn't box the long value I pass in.
local var Handle: 2.7ns/call
What can we do to improve the performance of non-static method handle invocation?
- Charlie
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Remi Forax
2018-02-02 13:03:35 UTC
Permalink
Hi Charles,
usually, it's because a non constant method handle is not inlined into the callsite,
so it's as fast as a function call or a method call when you ask to not inline.

A way to improve the perf is to profile the method handles that can be seen when doing an invokeExact,
and inline them if they are few of them, making invokeExact acts as a n-morphic inlining cache (with an identity check instanceof a class check).

Obviously, it's also easy to emulate think kind of cache with an invokedynamic, i think Golo has such cache (Golo lambdas are plain method handle),
and if you want to go fully circular, you can simulate invokedynamic with an invokeExact on a constant method handle :)

see you tomorrow,
Rémi

----- Mail original -----
Envoyé: Vendredi 2 Février 2018 13:33:49
Objet: Re: Performance of non-static method handles
Vladimir Ivanov did some work a few years ago on MH customization for hot MH
instances. It’s in the system. That should get better results than what you
show. I wonder why it isn’t kicking in. You are using invokeExact right?
Post by Charles Oliver Nutter
Hey folks!
I'm running some simple benchmarks for my FOSDEM handles talk and wanted to
reopen discussion about the performance of non-static-final method handles.
In my test, I just try to call a method that adds given argument to a static
long. The numbers for reflection and static final handle are what I'd expect,
Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call
If the handle is coming from an instance field or local variable, however,
performance is only slightly faster than reflection. I assume the only real
improvement in this case is that it doesn't box the long value I pass in.
local var Handle: 2.7ns/call
What can we do to improve the performance of non-static method handle invocation?
- Charlie
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Remi Forax
2018-02-02 13:11:31 UTC
Permalink
s/instanceof/instead of :)

Rémi

----- Mail original -----
Envoyé: Vendredi 2 Février 2018 14:03:35
Objet: Re: Performance of non-static method handles
Hi Charles,
usually, it's because a non constant method handle is not inlined into the callsite,
so it's as fast as a function call or a method call when you ask to not inline.
A way to improve the perf is to profile the method handles that can be seen when
doing an invokeExact,
and inline them if they are few of them, making invokeExact acts as a n-morphic
inlining cache (with an identity check instanceof a class check).
Obviously, it's also easy to emulate think kind of cache with an invokedynamic,
i think Golo has such cache (Golo lambdas are plain method handle),
and if you want to go fully circular, you can simulate invokedynamic with an
invokeExact on a constant method handle :)
see you tomorrow,
Rémi
----- Mail original -----
Envoyé: Vendredi 2 Février 2018 13:33:49
Objet: Re: Performance of non-static method handles
Vladimir Ivanov did some work a few years ago on MH customization for hot MH
instances. It’s in the system. That should get better results than what you
show. I wonder why it isn’t kicking in. You are using invokeExact right?
Post by Charles Oliver Nutter
Hey folks!
I'm running some simple benchmarks for my FOSDEM handles talk and wanted to
reopen discussion about the performance of non-static-final method handles.
In my test, I just try to call a method that adds given argument to a static
long. The numbers for reflection and static final handle are what I'd expect,
Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call
If the handle is coming from an instance field or local variable, however,
performance is only slightly faster than reflection. I assume the only real
improvement in this case is that it doesn't box the long value I pass in.
local var Handle: 2.7ns/call
What can we do to improve the performance of non-static method handle invocation?
- Charlie
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Paul Sandoz
2018-02-02 16:52:44 UTC
Permalink
At some point in the future it may be possible, with the constant folding work, to express the declaration of a MH locally but it gets stuffed in the constant pool (see amber constant-folding) if what the MH is derived from is constant. e.g. think of a language compiler intrinsic for ldc. That may be improve some use-cases but if any input is not constant we are back to the slower path.

Paul.
Post by Remi Forax
Hi Charles,
usually, it's because a non constant method handle is not inlined into the callsite,
so it's as fast as a function call or a method call when you ask to not inline.
A way to improve the perf is to profile the method handles that can be seen when doing an invokeExact,
and inline them if they are few of them, making invokeExact acts as a n-morphic inlining cache (with an identity check instanceof a class check).
Obviously, it's also easy to emulate think kind of cache with an invokedynamic, i think Golo has such cache (Golo lambdas are plain method handle),
and if you want to go fully circular, you can simulate invokedynamic with an invokeExact on a constant method handle :)
see you tomorrow,
Rémi
----- Mail original -----
Envoyé: Vendredi 2 Février 2018 13:33:49
Objet: Re: Performance of non-static method handles
Vladimir Ivanov did some work a few years ago on MH customization for hot MH
instances. It’s in the system. That should get better results than what you
show. I wonder why it isn’t kicking in. You are using invokeExact right?
Post by Charles Oliver Nutter
Hey folks!
I'm running some simple benchmarks for my FOSDEM handles talk and wanted to
reopen discussion about the performance of non-static-final method handles.
In my test, I just try to call a method that adds given argument to a static
long. The numbers for reflection and static final handle are what I'd expect,
Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call
If the handle is coming from an instance field or local variable, however,
performance is only slightly faster than reflection. I assume the only real
improvement in this case is that it doesn't box the long value I pass in.
local var Handle: 2.7ns/call
What can we do to improve the performance of non-static method handle invocation?
- Charlie
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Remi Forax
2018-02-02 17:57:17 UTC
Permalink
----- Mail original -----
Envoyé: Vendredi 2 Février 2018 17:52:44
Objet: Re: Performance of non-static method handles
At some point in the future it may be possible, with the constant folding work,
to express the declaration of a MH locally but it gets stuffed in the constant
pool (see amber constant-folding) if what the MH is derived from is constant.
e.g. think of a language compiler intrinsic for ldc.
yes,
That may be improve some use-cases but if any input is not constant we are back to the slower path.
you can put the non constant method handle into an inlining cache and magically, it becomes a constant
see https://gist.github.com/forax/1e0734f9aa976eab8a1fe982371a44a7
Paul.
Rémi
Post by Remi Forax
Hi Charles,
usually, it's because a non constant method handle is not inlined into the callsite,
so it's as fast as a function call or a method call when you ask to not inline.
A way to improve the perf is to profile the method handles that can be seen when
doing an invokeExact,
and inline them if they are few of them, making invokeExact acts as a n-morphic
inlining cache (with an identity check instanceof a class check).
Obviously, it's also easy to emulate think kind of cache with an invokedynamic,
i think Golo has such cache (Golo lambdas are plain method handle),
and if you want to go fully circular, you can simulate invokedynamic with an
invokeExact on a constant method handle :)
see you tomorrow,
Rémi
----- Mail original -----
Envoyé: Vendredi 2 Février 2018 13:33:49
Objet: Re: Performance of non-static method handles
Vladimir Ivanov did some work a few years ago on MH customization for hot MH
instances. It’s in the system. That should get better results than what you
show. I wonder why it isn’t kicking in. You are using invokeExact right?
Post by Charles Oliver Nutter
Hey folks!
I'm running some simple benchmarks for my FOSDEM handles talk and wanted to
reopen discussion about the performance of non-static-final method handles.
In my test, I just try to call a method that adds given argument to a static
long. The numbers for reflection and static final handle are what I'd expect,
Direct: 0.05ns/call
Reflected: 3ns/call
static final Handle: 0.05ns/call
If the handle is coming from an instance field or local variable, however,
performance is only slightly faster than reflection. I assume the only real
improvement in this case is that it doesn't box the long value I pass in.
local var Handle: 2.7ns/call
What can we do to improve the performance of non-static method handle invocation?
- Charlie
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
Loading...