Concept Tokens: Learning behavioral embeddings through concept definitions

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/54653 Cómo citar

Título:	Concept Tokens: Learning behavioral embeddings through concept definitions
Autor:	Sastre, Ignacio Rosá, Aiala
Tipo:	Preprint
Fecha de publicación:	2026
Resumen:	We propose Concept Tokens, a lightweight method that adds a new special token to a pretrained LLM and learns only its embedding from multiple natural language definitions of a target concept, where occurrences of the concept are replaced by the new token. The LLM is kept frozen and the embedding is optimized with the standard language-modeling objective. We evaluate Concept Tokens in three settings. First, we study hallucinations in closed-book question answering on HotpotQA and find a directional effect: negating the hallucination token reduces hallucinated answers mainly by increasing abstentions, whereas asserting it increases hallucinations and lowers precision. Second, we induce recasting, a pedagogical feedback strategy for second language teaching, and observe the same directional effect. Moreover, compared to providing the full definitional corpus in-context, concept tokens better preserve compliance with other instructions (e.g., asking follow-up questions). Finally, we include a qualitative study with the Eiffel Tower and a fictional "Austral Tower" to illustrate what information the learned embeddings capture and where their limitations emerge. Overall, Concept Tokens provide a compact control signal learned from definitions that can steer behavior in frozen LLMs.
Descripción:	Aceptado para su publicación en : 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) San Diego, California, July 2 - 7, 2026.
EN:	Computer Science (Computation and Language), arXiv:2601.04465, Jan 2026.
Financiadores:	Beca Maestría ANII POS_FMV_2023_1_1012622.
Citación:	Sastre, I. y Rosá, A. Concept Tokens: Learning behavioral embeddings through concept definitions [Preprint] Publicado en : Computer Science (Computation and Language), arXiv:2601.04465, Jan 2026. DOI: https://doi.org/10.48550/arXiv.2601.04465.
Licencia:	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Aparece en las colecciones:	Publicaciones académicas y científicas - Instituto de Computación

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
SR26.pdf	Preprint	373,81 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro Dublin Core completo del ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons