Jato - Inline Caching [GSoC 11]: May 2011

Saturday, 28 May 2011

Inline Caching in HotSpot VM

The class CompiledIC represents a call-site with an inline cache.

cpu/x86/vm/LIR_Assembler_x86.cpp: emit_call() emits the machine code for a call.

ic_call() generates the native code for a virtual call with inline cache.

Clean state

This is the initial state for a compiled call-site.

The code generated is
mov imm32, eax
call _resolve_virtual_call_C

Initially this imm32 value is a no-op. When the site becomes monomorphic this is set to the expected class.

Monomorphic state

share/vm/runtime/SharedRuntime.cpp:1268
The _resolve_virtual_call_C function gets the receiver (this) and the method handle by inspecting the stack and relevant bytecodes. This code can be found in SharedRuntime:: resolve_sub_helper(). The state of the IC is then made monomorphic by CompiledIC:: set_to_monomorphic(). set_to_monomorphic() patches the mov and call instruction. The imm32 value is a pointer to class handle and the target of the call instruction is the address of the method implementation for that class.

Megamorphic state

When a cache miss occurs the inline cache enters megamorphic state where a full vtable lookup is performed.

The code which checks for a cache miss is generated by check_icache(). This code is present in all non-static methods. This code loads this->class and compares it with value in eax. If they are not equal, it jumps to handle_wrong_method_ic_miss() which calls handle_ic_miss_helper() which performs the vtable lookup, patches the call-site to make it megamorphic and call the actual method.

For a megamorphic call-site, the eax contains method handle and the call target is a vtable lookup routine. The vtable lookup routine can use the "this" argument and the method handle to jump to the correct method.

Notes on Implementation

The remaining discussion sketches the call-site code at various times.

Clean state

mov #method, eax
call setup_ic

Monomorphic state

mov #class, eax
call method

Megamorphic state

mov #method, eax
call vtable_lookup

setup_ic:
Lookup vtable
if method not yet compiled {
Jump to trampoline
/* Leave the call-site untouched */
} else {
Patch the call-site
Jump to the method
}

vtable_lookup:
jump to this->class->vtable->native_ptr[i]

inline_cache_check:
cmp eax, this->class
jne inline_cache_miss

inline_cache_miss:
Patch call-site to megamorphic
Lookup vtable and jump to it

Saturday, 21 May 2011

invokeinterface

invokeinterface bytecode is used to call an interface method. The mechanism used to implement this is slightly different from invokevirtual. We cannot assign a unique virtual_index to an interface method as a class may implement two different interfaces creating a conflict in virtual_index. So virtual_index of an interface method varies from its implementation in one class to another (i.e virtual_index depends on method and class). So how can we find the virtual_index of a method given "this"? This information is stored in a hash table called itable. Each bucket in an itable is a list of itable_entry structures. Each itable_entry has a c_method structure which points to the vm_method structure of the implementation method (the method to be called). So an invokeinterface call boils down to:-

Searching this->itable[method->itable_index] for the required method. The itable_index for a method is the same across classes. The result of this search is c_method->virtual_index.
Call this->vtable[c_method->virtual_index].

The LIR generated for invokeinterface is produced by following code.

/* object class */
select_insn(s, tree, membase_reg_insn(INSN_MOV_MEMBASE_REG,
call_target, offsetof(struct vm_object, class), call_target));

/* itable entry */
select_insn(s, tree, imm_reg_insn(INSN_ADD_IMM_REG,
offsetof(struct vm_class, itable) + method->itable_index * sizeof(void *),
call_target));

/* hidden parameter to the conflict resolution stub */
select_insn(s, tree, imm_reg_insn(INSN_MOV_IMM_REG,
(unsigned long) method, eax));

/* invoke method */
call_insn = reverse_reg_insn(INSN_CALL_REG, call_target);

This call instruction transfers control to itable conflict resolver generated by emit_itable_resolver_stub(). This function emits code to do a binary search on the appropriate list and jump to the correct vtable entry (See emit_itable_bsearch()).

Friday, 20 May 2011

Details of Virtual Calls

A virtual call to a function looks like

invokevirtual class.method()

in bytecode. The class here is the declared type of the reference and not the actual type of the reference. The method to be called depends on the actual type of the reference which can only be resolved at run time.

Jato Internals for a virtual call

The vm maintains a vm_class structure corresponding to each class. Each vm_class structure contains a vtable which is an array of pointers to executable code of methods in that class. The vtable is organized such that the index (into the vtable) of a virtual method is the same in the base class and all derived classes. i.e if name() has index 10 in Fruit class. Its index is 10 in Apple, Orange, or any other class derived from Fruit. This index is maintained in the virtual_index field in vm_method structure. So executing a virtual call involves.

Loading the correct vtable entry using the virtual_index.
Executing a call instruction to that address.

The code for this can be found in arch/x86/insn-selector_32.brg:invokevirtual()

/* object reference */
call_target = state->left->reg1;

/* object class */
select_insn(s, tree, membase_reg_insn(INSN_MOV_MEMBASE_REG, call_target, offsetof(struct vm_object, class), call_target));

/* vtable */
select_insn(s, tree, membase_reg_insn(INSN_MOV_MEMBASE_REG, call_target, offsetof(struct vm_class, vtable), call_target));

/* native ptr */
select_insn(s, tree, imm_reg_insn(INSN_ADD_IMM_REG, method_offset, call_target));

/* invoke method */
call_insn = reverse_reg_insn(INSN_CALL_REG, call_target);

select_safepoint_insn(s, tree, call_insn);

Initially this call goes to the trampoline function jit_magic_trampoline(). It compiles the method body and patches the vtable entry to point directly to the compiled code. This vtable patching can be found in jit/vtable.c:fixup_vtable() [See also emit_trampoline()].

We can trace the output of different phases of Jato by using the option -Xtrace:jit. Below we show the output of different phases for the virtual call.

Fruit f;
.
.
f.name();

HIR

[main] INVOKEVIRTUAL:
[main] target_method: [0xa24d9d0 'jvm/MethodInvokeVirtualTest$Fruit.name()Ljava/lang/String;' (12)]
[main] args_list:
[main] ARG_THIS:
[main] arg_expression: [temporary reference 0xa415a60 (low)]
[main] result: [temporary reference 0xa4172e8 (low)]

The HIR is very "close" to java bytecode. The only information available at this point is a method descriptor.

LIR

[main] [ 1 ] 2: push_reg r11 ; Push this to stack
[main] [ 1 ] 4: mov_membase_reg $0x0(r11), r11 ; r11 <- this->class
[main] [ 1 ] 6: mov_membase_reg $0x7c(r11), r11 ; r11 <- this->class->vtable
[main] [ 1 ] 8: add_imm_reg $0x30, r11 ; r11 <- &this->class->vtable[method_index]
[main] [ 1 ] 10: test_imm_memdisp $0x0, ($0xa0ae000)
[main] [ 1 ] 12: call_reg (r11) ; jump to the virtual method

The LIR is very close to machine code. The actual machine registers used are determined only during code generation. The LIR encodes the logic to lookup the vtable entry and jump to it. The number $0x30 depends on the virtual method to be called (the virtual_index).

Machine Code

[main] [ 1 ] 0xa775226c: 57 push %edi
[main] [ 1 ] 0xa775226d: 8b 3f mov (%edi),%edi
[main] [ 1 ] 0xa775226f: 8b 7f 7c mov 0x7c(%edi),%edi
[main] [ 1 ] 0xa7752272: 81 c7 30 00 00 00 add $0x30,%edi
[main] [ 1 ] 0xa7752278: f6 04 25 00 e0 0a 0a 00 testb $0x0,0xa0ae000(,%eiz,1)
[main] [ 1 ] 0xa7752280: ff 17 call *(%edi)

This code looks very similar to LIR except that the actual machine registers to be used have been determined.