Memory Tokens: Large Language Models can generate reversible sentence embeddings

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/20.500.12008/54654 Cómo citar

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.author	Sastre, Ignacio	-
dc.contributor.author	Rosá, Aiala	-
dc.date.accessioned	2026-04-28T17:42:47Z	-
dc.date.available	2026-04-28T17:42:47Z	-
dc.date.issued	2025	-
dc.identifier.citation	Sastre, I. y Rosá, A. Memory Tokens: Large Language Models can generate reversible sentence embeddings [Preprint] Publicado en : Proceedings of the First Workshop on Large Language Model Memorization (L2M2), Vienna, Austria, August 2025. pp. 183–189.	es
dc.identifier.uri	https://hdl.handle.net/20.500.12008/54654	-
dc.description.abstract	In this work, we observe an interesting phenomenon: it is possible to generate reversible sentence embeddings that allow an LLM to reconstruct the original text exactly, without modifying the model’s weights. This is achieved by introducing a special memory token, whose embedding is optimized through training on a fixed sequence. When prompted with this embedding, the model reconstructs the fixed sequence exactly. We evaluate this phenomenon across English and Spanish datasets, sequences of up to approximately 240 tokens, and model scales ranging from 100M to 8B parameters. Notably, Llama 3.1 8B successfully reconstructs all tested sequences. Our findings highlight an interesting capability of LLMs and suggest potential applications in memory-based retrieval, compression, and controlled text generation.	es
dc.description.sponsorship	Beca Maestría ANII POS_FMV_2023_1_1012622.	es
dc.format.extent	7 p.	es
dc.format.mimetype	application/pdf	es
dc.language.iso	en	es
dc.rights	Las obras depositadas en el Repositorio se rigen por la Ordenanza de los Derechos de la Propiedad Intelectual de la Universidad de la República.(Res. Nº 91 de C.D.C. de 8/III/1994 – D.O. 7/IV/1994) y por la Ordenanza del Repositorio Abierto de la Universidad de la República (Res. Nº 16 de C.D.C. de 07/10/2014)	es
dc.title	Memory Tokens: Large Language Models can generate reversible sentence embeddings	es
dc.type	Preprint	es
dc.contributor.filiacion	Sastre Ignacio, Universidad de la República (Uruguay). Facultad de Ingeniería.	-
dc.contributor.filiacion	Rosá Aiala, Universidad de la República (Uruguay). Facultad de Ingeniería.	-
dc.rights.licence	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)	es
Aparece en las colecciones:	Publicaciones académicas y científicas - Instituto de Computación

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
SR25.pdf	Preprint	371,17 kB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons