Representation analysis for code-switched LLMs

Code-Switching Reveals Language Anchoring in Multilingual LLMs

We show that code-switched inputs are internally pulled toward language-specific anchors, and use this signal to steer target-language token states with CANVAS.

Jeonghyun Park · Seunghyun Yoon · Yonghyun Jun · Hwanhee Lee
Chung-Ang University · Adobe Research

CS states are not neutral. Matched monolingual references reveal whether a mixed-language state follows the source or target side.
Grammar frame matters. Source-framed and target-framed CS inputs follow different internal representation trajectories.
Anchoring is actionable. CANVAS uses the source-side canvas from the same input to correct target-ward drift during prefill.
Overview

Why look inside code-switched representations?

Code-switching is structured multilingual language use, not random lexical noise. A model can answer a monolingual question yet fail when the same information need is expressed through a mixed-language form. We therefore ask where the mixed-language hidden state sits relative to its matched source-language and target-language counterparts.

Controlled code-switched QA

For each question, we compare source-only, target-only, source-framed CS, and target-framed CS variants. This keeps the factual query fixed while changing the language structure.

From diagnosis to intervention

Anchor Bias measures which monolingual reference the CS state approaches. CANVAS then uses the same internal signal to adjust hidden states without changing the input text or model weights.

Anchor Bias

CS representations move with the grammatical frame.

Anchor Bias compares the similarity of a CS representation to its source and target monolingual anchors. Positive values indicate source-oriented states; negative values indicate target-oriented states.

AB_l(q_cs) = cos(r_l(q_src), r_l(q_cs)) - cos(r_l(q_tgt), r_l(q_cs))
Layer-wise anchor bias across models for grammar-forced code-switched inputs.
Layer-wise anchor trajectories across models. The source-framed and target-framed conditions separate most clearly in upper layers, motivating CANVAS's upper-layer intervention window.
CANVAS

Source-canvas alignment during prefill.

CANVAS computes an in-context source-side canvas from the same CS input, estimates target-ward drift, and softly aligns target-language token representations toward that canvas in the upper layers.

1

Partition tokens

Identify source-language, target-language, and neutral tokens in the user question span.

2

Build source canvas

Mean-pool source-token hidden states in the control layers to obtain a contextual anchor.

3

Scale adaptively

Increase correction when the internal alignment score indicates stronger target orientation.

4

Interpolate hidden states

Steer only target-token representations, then decode from the adjusted prefill cache.

h_tilde(l,t) = (1 - alpha) h(l,t) + alpha c_src(l)
Main Results

CANVAS improves direct CS answering.

We compare direct CS answering with a random-direction control and the adaptive CANVAS method. The condition columns report adaptive CANVAS minus Base F1.

Model Base Rand CANVAS GF-SRC GF-TGT Random Selective
Llama3.2-1B2.572.683.20+0.08+0.82+1.01+0.62
Mistral-7B4.534.965.06+0.14+0.80+0.65+0.55
Aya-Expanse-8B5.995.996.88+0.62+1.41+0.63+0.91
Llama3.1-8B6.216.727.79+1.21+0.64+2.15+2.34
Qwen3-30B-A3B6.916.938.11+0.06+2.48+0.71+1.55
Qwen3.5-27B14.1414.2215.40+0.28+1.74+0.96+2.03
Qwen3.6-35B-A3B15.8016.0717.53+0.66+3.58+1.41+1.28
Representation Movement

The correction is visible in hidden-state geometry.

We project CS representations onto the source-target axis and compare Base with CANVAS. The arrows show that the intervention moves CS centroids toward the source-side region while the adaptive coefficient grows for more target-heavy inputs.

Representation movement from Base to CANVAS along the source-target axis.
Centroid movement from Base to CANVAS in the SRC-TGT representation plane.
Adaptive alpha selected by CANVAS for source-token ratio bins.
Adaptive strength increases for more target-heavy inputs.