english Icono del idioma   español Icono del idioma  

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/54653 Cómo citar
Título: Concept Tokens: Learning behavioral embeddings through concept definitions
Autor: Sastre, Ignacio
Rosá, Aiala
Tipo: Preprint
Fecha de publicación: 2026
Resumen: We propose Concept Tokens, a lightweight method that adds a new special token to a pretrained LLM and learns only its embedding from multiple natural language definitions of a target concept, where occurrences of the concept are replaced by the new token. The LLM is kept frozen and the embedding is optimized with the standard language-modeling objective. We evaluate Concept Tokens in three settings. First, we study hallucinations in closed-book question answering on HotpotQA and find a directional effect: negating the hallucination token reduces hallucinated answers mainly by increasing abstentions, whereas asserting it increases hallucinations and lowers precision. Second, we induce recasting, a pedagogical feedback strategy for second language teaching, and observe the same directional effect. Moreover, compared to providing the full definitional corpus in-context, concept tokens better preserve compliance with other instructions (e.g., asking follow-up questions). Finally, we include a qualitative study with the Eiffel Tower and a fictional "Austral Tower" to illustrate what information the learned embeddings capture and where their limitations emerge. Overall, Concept Tokens provide a compact control signal learned from definitions that can steer behavior in frozen LLMs.
Descripción: Aceptado para su publicación en : 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) San Diego, California, July 2 - 7, 2026.
EN: Computer Science (Computation and Language), arXiv:2601.04465, Jan 2026.
Financiadores: Beca Maestría ANII POS_FMV_2023_1_1012622.
Citación: Sastre, I. y Rosá, A. Concept Tokens: Learning behavioral embeddings through concept definitions [Preprint] Publicado en : Computer Science (Computation and Language), arXiv:2601.04465, Jan 2026. DOI: https://doi.org/10.48550/arXiv.2601.04465.
Licencia: Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Aparece en las colecciones: Publicaciones académicas y científicas - Instituto de Computación

Ficheros en este ítem:
Fichero Descripción Tamaño Formato   
SR26.pdfPreprint373,81 kBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons