LLMsMedium impactFor DevarXiv LLMs · June 11, 2026

Influcoder: Distilling Decoders' Gradient Influence Rankings into an Encoder for Data Attribution

Influcoder is a method to efficiently estimate influence rankings of training samples on LLM outputs by distilling gradient-based influence functions into a compact encoder.
Signal strength3.4/5·arXiv LLMs

Influcoder is a method to efficiently estimate influence rankings of training samples on LLM outputs by distilling gradient-based influence functions into a compact encoder.

TL;DR

Influcoder is a method to efficiently estimate influence rankings of training samples on LLM outputs by distilling gradient-based influence functions into a compact encoder.

What happened

A novel approach named Influcoder was proposed to enable scalable, fast, and storage-efficient influence-based data attribution for LLM training data by learning to approximate gradient influence rankings through an encoder.

Why it matters

This method addresses the shortcomings of traditional influence function methods, which are computationally expensive and storage-heavy, enabling practical data attribution and filtering on large-scale LLM datasets.

Generating deep dive...

AI-powered analysis takes a few seconds