The IR, for simplicity and efficiency reasons, largely follows the IL model, where only 32 and 64 bit integers are tracked as distinct types, while integers smaller than that exist only for storage locations, and are implictly widened on load and narrowed on store.
Thus, you will not see primitive arithmetic operations of, for example, type SHORT in the IR - they only exist for INT and LONG (ignoring BYREFs).
The following is a list of IR nodes known to use small types and what semantics they have:
INDs on the RHS of an assignment (and always in LIR): specify the width of the indirection. Signedness of the type determines whether the load will use sign extension or zero extension. These nodes always produceINTs.INDs on the LHS of an assigment (STOREINDin LIR): specify the width of the storage location.- Relops (
EQ/NE/LE/GE/LT/GT) of typeUBYTE- xarch-specific lowering optimization that means the result doesn't need to be zero-extended (currently, it happens for aSTOREINDuser). CALLs - yet to be investigated. Interesting case: callees that do not normalize the return (i. e. native calls).ASGs - use the type of the LHS. Type ofASGs largely does not matter except in cases where theASGis a setup arg where it keeps the illusion that the non-late arg "produces a value".LCL_VARs that areNormalizeOnLoad: arguments, address-exposed locals, and promoted struct fields.- On the RHS: get wrapped in a
CASTto the small type by morph and retyped asINTs. Also always produceINTs.- One interesting detail of
NormalizeOnLoadvariables is that, if the variable is used from a memory location, the "load" can be "out of bounds" (location assigned to such a variable can be narrower thansizeof(int)), since it is performed by theLCL_VARnode, which is typed asint(the width of theLCL_VARdetermines the width of the load). It should not cause problems as it is immediately extended, so the upper bits are never seen by anything other than the cast, and reading 2 or 3 bytes more from a stack location (where this variable lives) ought not to have other ill effects (except for perhaps confusing the debugger once in a while). Still, this does mean that optimizations operating withNormalizeOnLoadvariables need to be mindful of this fact. - Because of this detail, value numbers given to
NormalizeOnLoadare "wrong" in the sense that theLCL_VARtree doesn't compute the narrow number it was given at a definition (which is the way they are numbered now), but rather it plus whatever random bytes will happen to be next to the local on the stack at the point of use.
- One interesting detail of
- On the LHS: remain typed small, mirroring the "width of the store" semantic that
INDs on the LHS have.
- On the RHS: get wrapped in a
LCL_VARs that areNormalizeOnStore:- On the RHS: get retyped as
INT. - On the LHS: sometimes do not get retyped (CSE), sometimes do (importing IL locals), produce
INTbut without the extension - it is implicit that the upper bits are in a "good" state.NormalizeOnStorevariables are always assigned a location size ofsizeof(int)bytes, whether they're used from memory or from a register.
- On the RHS: get retyped as
- Lowering (see
Lowering::OptimizeConstCompare) transforms relops with casts toUBYTE/BOOLas the first operands in a unique way: it gets rid of the cast and retypes its source to the small type. Unlike all other known cases in the IR, this "small node" actually has a "true" small type, i. e. there are no implicit extensions allowed for it - they would in fact be incorrect. This is because the original cast would have zero-extended from the source, but now the value is used "as-is", with the upper bits preserved in case it ends up in a register. In theory this optimization could be applied to most operands but now it is only enabled for a few select ones. This optimization has one undesirable side-effect: in results in a larger encoding for thetestinstruction when the operands end up inRSI,RDI,RBPorRSPon Amd64. This optimization is also the reason why decomposition has to insert "redundant" casts to small types (viaDecomposeLongs::EnsureIntSized) - they are no longer redundant, the types are actually small.