series of switchpoints or better

Discussion:

Jochen Theodorou

2016-10-05 12:37:21 UTC

Hi all,

I am constructing a new meta class system for Groovy (ok, I say that for
several years already, but bear with me) and I was wondering about the
actual performance of switchpoints.

In my current scenario I would need a way to say a certain group of meta
classes got updated and the method for this callsite needs potentially
be reselected.

So if I have class A, class B and then I have a meta class for Object
and one for A.

If the meta class for A is changed, all handles operating on instances
of A may have to reselect. the handles for B and Object need not to be
affected. If the meta class for Object changes, I need to invalidate all
the handles for A, B and Object.

Doing this with switchpoints means probably one switchpoint per
metaclass and a small number of meta classes per class (in total 3 in my
example). This would mean my MethodHandle would have to get through a
bunch of switchpoints, before it can do the actual method invocation.
And while switchpoints might be fast it does not sound good to me.

Or I can do one switchpoint for all methodhandles in the system, which
makes me wonder if after a meta class change the callsite ever gets
Jitted again. The later performance penalty is actually also not very
attractive to me.

So what is the way to go here? Or is there an even better way?

bye Jochen

Chris Seaton

2016-10-05 12:47:24 UTC

Permalink

Hi Jochen,

Iâm not an expert on the implementation of switch points, but my understanding is that they donât appear in the dynamically compiled machine code at all. They use the safe point mechanism of the VM (the same thing that does the stop-the-world in the garbage collectors) for which polling instructions are already there anyway.

http://chrisseaton.com/rubytruffle/icooolps15-safepoints/safepoints.pdf <http://chrisseaton.com/rubytruffle/icooolps15-safepoints/safepoints.pdf>

See figure 6 (no significant difference in runtime with switch points there or not), and figure 9 (machine code contains no trace of them). So switch points arenât just fast - they donât take any time at all. (Ignore the references to Truffle if you arenât using that.)

I donât think any of this would change no matter how many of them you have.

Iâm sure they do have an impact on interpreter performance, of course, where they canât be optimised away.

I suppose it could conceivably be the case that a great many switch points may start to upset the compiler in terms of things like inlining budgets? Iâm not sure, but seems unlikely.

Chris

Post by Jochen Theodorou
Hi all,
I am constructing a new meta class system for Groovy (ok, I say that for several years already, but bear with me) and I was wondering about the actual performance of switchpoints.
In my current scenario I would need a way to say a certain group of meta classes got updated and the method for this callsite needs potentially be reselected.
So if I have class A, class B and then I have a meta class for Object and one for A.
If the meta class for A is changed, all handles operating on instances of A may have to reselect. the handles for B and Object need not to be affected. If the meta class for Object changes, I need to invalidate all the handles for A, B and Object.
Doing this with switchpoints means probably one switchpoint per metaclass and a small number of meta classes per class (in total 3 in my example). This would mean my MethodHandle would have to get through a bunch of switchpoints, before it can do the actual method invocation. And while switchpoints might be fast it does not sound good to me.
Or I can do one switchpoint for all methodhandles in the system, which makes me wonder if after a meta class change the callsite ever gets Jitted again. The later performance penalty is actually also not very attractive to me.
So what is the way to go here? Or is there an even better way?
bye Jochen
_______________________________________________
mlvm-dev mailing list
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Remi Forax

2016-10-05 13:53:40 UTC

Permalink

Hi Jochen,
hi Chris,

EnvoyÃ©: Mercredi 5 Octobre 2016 14:47:24
Objet: Re: series of switchpoints or better
Hi Jochen,
Iâm not an expert on the implementation of switch points, but my understanding
is that they donât appear in the dynamically compiled machine code at all. They
use the safe point mechanism of the VM (the same thing that does the
stop-the-world in the garbage collectors) for which polling instructions are
already there anyway.
http://chrisseaton.com/rubytruffle/icooolps15-safepoints/safepoints.pdf
See figure 6 (no significant difference in runtime with switch points there or
not), and figure 9 (machine code contains no trace of them). So switch points
arenât just fast - they donât take any time at all. (Ignore the references to
Truffle if you arenât using that.)
I donât think any of this would change no matter how many of them you have.

The only cost, if the code is JITed is that the VM has to maintain a dependency list in order to know which JITed code should be marked as dead when the switchpoint is invalidated.
It's not usually a big deal and don't forget that using a MutableCallSite also creates the same dependency list.

Iâm sure they do have an impact on interpreter performance, of course, where
they canât be optimised away.

it's just a volatile read in the interpreter, the cost is negligible compared to the cost of invoking the method handle by itself.

I suppose it could conceivably be the case that a great many switch points may
start to upset the compiler in terms of things like inlining budgets? Iâm not
sure, but seems unlikely.

no, as you said a switchpoint is compiled to zero assembly code and more generally method handles are not counted in the inlining budget.

Chris

Post by Jochen Theodorou
Hi all,
I am constructing a new meta class system for Groovy (ok, I say that for several
years already, but bear with me) and I was wondering about the actual
performance of switchpoints.
In my current scenario I would need a way to say a certain group of meta classes
got updated and the method for this callsite needs potentially be reselected.
So if I have class A, class B and then I have a meta class for Object and one for A.
If the meta class for A is changed, all handles operating on instances of A may
have to reselect. the handles for B and Object need not to be affected. If the
meta class for Object changes, I need to invalidate all the handles for A, B
and Object.
Doing this with switchpoints means probably one switchpoint per metaclass and a
small number of meta classes per class (in total 3 in my example). This would
mean my MethodHandle would have to get through a bunch of switchpoints, before
it can do the actual method invocation. And while switchpoints might be fast it
does not sound good to me.
Or I can do one switchpoint for all methodhandles in the system, which makes me
wonder if after a meta class change the callsite ever gets Jitted again. The
later performance penalty is actually also not very attractive to me.
So what is the way to go here? Or is there an even better way?

You can crawle the hierarchy from the class that is changed to all the subclasses, gather all the switchpoints and invalidate them all at once.
In that case, you will only have one switchpoint by metaclass.

see https://github.com/qmx/jsr292-cookbook/tree/master/metaclass

And don't be afraid of the number of switchpoints you use, Nashorn will use more than you :)

Post by Jochen Theodorou
bye Jochen

RÃ©mi

Charles Oliver Nutter

2016-10-05 14:00:29 UTC

Permalink

Hi Jochen!

Post by Jochen Theodorou
If the meta class for A is changed, all handles operating on instances of
A may have to reselect. the handles for B and Object need not to be
affected. If the meta class for Object changes, I need to invalidate all
the handles for A, B and Object.

This is exactly how JRuby's type-modification guards work. We've used this
technique since our first implementation of indy call sites.

Post by Jochen Theodorou
Doing this with switchpoints means probably one switchpoint per metaclass
and a small number of meta classes per class (in total 3 in my example).
This would mean my MethodHandle would have to get through a bunch of
switchpoints, before it can do the actual method invocation. And while
switchpoints might be fast it does not sound good to me.

From what I've seen, it's fine as far as hot performance. Adding complexity
to your handle chains likely impacts cold perf, of course.

Can you elaborate on the structure? JRuby has 6-deep (configurable)
polymorphic caching, with each entry being a GWT (to check type) and a SP
(to check modification) before hitting the plumbing for the method itself.

I will say that using SwitchPoints is FAR better than our alternative
mechanism: pinging the (meta)class each time and checking a serial number.

Post by Jochen Theodorou
Or I can do one switchpoint for all methodhandles in the system, which
makes me wonder if after a meta class change the callsite ever gets Jitted
again. The later performance penalty is actually also not very attractive
to me.

We have fought to keep the JIT from giving up on us, and I believe that as
of today you can invalidate call sites forever and the JIT will still
recompile them (within memory, code cache, and other limits of course).

However, you'll be invalidating every call site for every modification. If
the system eventually settles, that's fine. If it doesn't, you're going to
be stuck with cold call site performance most of the time.

Post by Jochen Theodorou
So what is the way to go here? Or is there an even better way?

I strongly recommend the switchpoint-per-class granularity (or finer, like
switchpoint-per-class-and-method-name, which I am playing with now).

- Charlie

John Rose

2016-10-05 20:22:34 UTC

Permalink

I will say that using SwitchPoints is FAR better than our alternative mechanism: pinging the (meta)class each time and checking a serial number.

This makes my day! That's exactly what SwitchPoints are designed to deliver. They are intended to hook into the same mechanism the JVM uses to invalidate code that has out-of-date devirtualized methods. (I.e., when you load the second definition overriding a method, you might cause some call sites to recompile. That stuff is too good not to share with language implementors.)

Or I can do one switchpoint for all methodhandles in the system, which makes me wonder if after a meta class change the callsite ever gets Jitted again. The later performance penalty is actually also not very attractive to me.
We have fought to keep the JIT from giving up on us, and I believe that as of today you can invalidate call sites forever and the JIT will still recompile them (within memory, code cache, and other limits of course).

I'm glad of that. It has been a fight to tuned everything up just so. Can you say (or blog) more about what you had to tweak to keep the JIT happen? That may help other users, and/or help us make the JIT less irritable.

However, you'll be invalidating every call site for every modification. If the system eventually settles, that's fine. If it doesn't, you're going to be stuck with cold call site performance most of the time.
So what is the way to go here? Or is there an even better way?
I strongly recommend the switchpoint-per-class granularity (or finer, like switchpoint-per-class-and-method-name, which I am playing with now).

For the classic devirtualization trick, the JVM uses something that works like a cross between a switchpoint per super-class and a switchpoint per method: When you load a new sub-class, each of its supers S is walked, causing a search for any devirtualized methods S.m in any compiled code. A compiled code blob ("nmethod") contains a list of dependencies which might (under suitable conditions) require it to be discarded and recompiled. This list can contain something like "S.m was devirtualized and hasn't been overridden yet", or something else like "switchpoint x has not been triggered yet".

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/08492e67bf32/src/share/vm/code/codeCache.cpp#l1180 <http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/08492e67bf32/src/share/vm/code/codeCache.cpp#l1180>

The logic which handles switchpoints is (as Remi said) built on top of mutable call sites, which are handled in the JVM here:

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/08492e67bf32/src/share/vm/code/dependencies.cpp#l1740 <http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/08492e67bf32/src/share/vm/code/dependencies.cpp#l1740>

I'm always glad when this stuff works like it's supposed to. It emboldens me to try more!

â John

Jochen Theodorou

2016-10-05 22:32:42 UTC

Permalink

On 05.10.2016 22:22, John Rose wrote:
[...]

Post by John Rose
For the classic devirtualization trick, the JVM uses something that
works like a cross between a switchpoint per super-class and a
switchpoint per method: When you load a new sub-class, each of its
supers S is walked, causing a search for any devirtualized methods S.m
in any compiled code. A compiled code blob ("nmethod") contains a list
of dependencies which might (under suitable conditions) require it to be
discarded and recompiled. This list can contain something like "S.m was
devirtualized and hasn't been overridden yet", or something else like
"switchpoint x has not been triggered yet".
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/08492e67bf32/src/share/vm/code/codeCache.cpp#l1180
The logic which handles switchpoints is (as Remi said) built on top of
http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/08492e67bf32/src/share/vm/code/dependencies.cpp#l1740
I'm always glad when this stuff works like it's supposed to. It emboldens me to try more!

I see... the problem is actually similar, only that I do not have to do
something like that on a per "subclass added" event, but on a per
"method crud operation" event. And instead of going up to check for a
devirtualization, I have to actually propagate the change to all meta
classes of subclasses... and interface implementation (if the change was
made to an interface). So far I was thinking of making this lazy... but
maybe I should actually mark the classes as "dirty" eagerly... sorry...
not part of the discussion I guess ;)

bye Jochen

Charles Oliver Nutter

2016-10-05 22:51:25 UTC

Permalink

Post by Jochen Theodorou
I see... the problem is actually similar, only that I do not have to do

something like that on a per "subclass added" event, but on a per "method
crud operation" event. And instead of going up to check for a
devirtualization, I have to actually propagate the change to all meta
classes of subclasses... and interface implementation (if the change was
made to an interface). So far I was thinking of making this lazy... but
maybe I should actually mark the classes as "dirty" eagerly... sorry... not
part of the discussion I guess ;)

Oh I think it is certainly relevant! JRuby does this invalidation eagerly,
but the cost can be high for changes to classes close to the root of the
hierarchy. You have fewer guards at each call site, though.

John's description of how Hotspot does this is also helpful; at least in
JRuby, searching up-hierarchy for overridden methods is just a name lookup
since Ruby does not overload. I've prototyped a similar system, with a
SwitchPoint per method, but ran into some hairy class structures that made
it complicated. The override search may be the answer for me.

- Charlie (mobile)

Jochen Theodorou

2016-10-05 23:26:01 UTC

Permalink

On 06.10.2016 00:51, Charles Oliver Nutter wrote:
[...]

Post by Charles Oliver Nutter
JRuby does this invalidation
eagerly, but the cost can be high for changes to classes close to the
root of the hierarchy. You have fewer guards at each call site, though.

I think that is ok for Groovy.

There is one more special problem I have though: per instance meta
classes. So even if a x and y have the same class as per JVM, they can
have differing meta classes. Which means a switchpoint alone is not
enough... well, trying to get rid of that in the new MOP.

Post by Charles Oliver Nutter
John's description of how Hotspot does this is also helpful; at least in
JRuby, searching up-hierarchy for overridden methods is just a name
lookup since Ruby does not overload.

not overloading solves many problems ;)

Post by Charles Oliver Nutter
I've prototyped a similar system,
with a SwitchPoint per method, but ran into some hairy class structures
that made it complicated. The override search may be the answer for me.

yeah, I can imagine.

bye Jochen

Charles Oliver Nutter

2016-10-06 03:34:13 UTC

Permalink

Post by Jochen Theodorou
There is one more special problem I have though: per instance meta
classes. So even if a x and y have the same class as per JVM, they can have
differing meta classes. Which means a switchpoint alone is not enough...
well, trying to get rid of that in the new MOP.

JRuby also has per-instance classes (so-called "singleton classes"). We
treat them like any other class. HOWEVER...if there's a singleton class
that does not override any methods from the original class, it shares a
SwitchPoint until such time that it is modified.

I've also considered caching singleton classes of various shapes, so we can
just choose based on known shapes...but never went further with that
experiment.

- Charlie

Mark Roos

2017-10-14 21:25:27 UTC

Permalink

_______________________________________________
mlvm-dev mailing list
mlvm-***@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev