Concept Tokens: Learning behavioral embeddings through concept definitions

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/54653 Cómo citar

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Sastre, Ignacio	-
dc.contributor.author	Rosá, Aiala	-
dc.date.accessioned	2026-04-28T17:42:07Z	-
dc.date.available	2026-04-28T17:42:07Z	-
dc.date.issued	2026	-
dc.identifier.citation	Sastre, I. y Rosá, A. Concept Tokens: Learning behavioral embeddings through concept definitions [Preprint] Publicado en : Computer Science (Computation and Language), arXiv:2601.04465, Jan 2026. DOI: https://doi.org/10.48550/arXiv.2601.04465.	es
dc.identifier.uri	https://hdl.handle.net/20.500.12008/54653	-
dc.description	Aceptado para su publicación en : 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) San Diego, California, July 2 - 7, 2026.	es
dc.description.abstract	We propose Concept Tokens, a lightweight method that adds a new special token to a pretrained LLM and learns only its embedding from multiple natural language definitions of a target concept, where occurrences of the concept are replaced by the new token. The LLM is kept frozen and the embedding is optimized with the standard language-modeling objective. We evaluate Concept Tokens in three settings. First, we study hallucinations in closed-book question answering on HotpotQA and find a directional effect: negating the hallucination token reduces hallucinated answers mainly by increasing abstentions, whereas asserting it increases hallucinations and lowers precision. Second, we induce recasting, a pedagogical feedback strategy for second language teaching, and observe the same directional effect. Moreover, compared to providing the full definitional corpus in-context, concept tokens better preserve compliance with other instructions (e.g., asking follow-up questions). Finally, we include a qualitative study with the Eiffel Tower and a fictional "Austral Tower" to illustrate what information the learned embeddings capture and where their limitations emerge. Overall, Concept Tokens provide a compact control signal learned from definitions that can steer behavior in frozen LLMs.	es
dc.description.sponsorship	Beca Maestría ANII POS_FMV_2023_1_1012622.	es
dc.format.extent	18 p.	es
dc.format.mimetype	application/pdf	es
dc.language.iso	en	es
dc.relation.ispartof	Computer Science (Computation and Language), arXiv:2601.04465, Jan 2026.	es
dc.rights	Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)	es
dc.title	Concept Tokens: Learning behavioral embeddings through concept definitions	es
dc.type	Preprint	es
dc.contributor.filiacion	Sastre Ignacio, Universidad de la República (Uruguay). Facultad de Ingeniería.	-
dc.contributor.filiacion	Rosá Aiala, Universidad de la República (Uruguay). Facultad de Ingeniería.	-
dc.rights.licence	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)	es
Aparece en las colecciones:	Publicaciones académicas y científicas - Instituto de Computación

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
SR26.pdf	Preprint	373,81 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons