KVCache.php
KVCache.php
Purpose
The KVCache.php file contains the implementation of an interleaved Key-Value (KV) cache for use in autoregressive transformer inference. This cache is designed to store K and V vectors efficiently, allowing for optimized attention mechanisms without excessive memory allocation or computation.
Key Components
Classes, Functions, Methods with Signatures
- KVCache
- Class:
final class KVCache- A final class representing the KV cache.
- Class:
- Constructor (
__construct):public function __construct(int $cap, int $headDim)- Initializes a new KV cache with a specified capacity (
$cap) and head dimension ($headDim). It allocates memory using an FFI call tokvcache_create.
- Initializes a new KV cache with a specified capacity (
- append:
public function append(Tensor $k, Tensor $v): void- Appends one or more new K,V token rows to the cache. Uses FFI to call
kvcache_append.
- Appends one or more new K,V token rows to the cache. Uses FFI to call
- len:
public function len(): int- Returns the number of tokens currently stored in the cache.
- reset:
public function reset(): void- Resets the cache to an empty state without deallocating memory by calling
kvcache_reset.
- Resets the cache to an empty state without deallocating memory by calling
- Destructor (
__destruct):public function __destruct()- Frees the allocated memory for the KV cache when the object is destroyed.
Important Variables and Constants
- None explicitly defined in the class. The class relies on FFI calls to manage internal state, which is encapsulated within the
kvcache_create,kvcache_append, etc., functions.
Inputs / Outputs
For ML Components (Autoregressive Transformer)
- Inputs:
- K and V: Instances of the
Tensorclass representing the Key and Value vectors to be stored in the cache.
- K and V: Instances of the
- Outputs:
- No explicit outputs; the methods modify the internal state of the KV cache.
For Utility Files
- This is a utility file, but it does not take or return any parameters directly. It operates on an instance of
KVCache.
Dependencies
- Internal Dependencies:
- The class relies on FFI (Foreign Function Interface) calls to a C library for managing the KV cache.
- External Dependencies:
- The class depends on the
TensorEngineclass, which is assumed to provide access to FFI functions for tensor operations and error handling.
- The class depends on the
Usage Notes
Integration with the Framework
-
This file should be integrated into the larger machine learning framework as a component of an autoregressive transformer model. It facilitates efficient management of the attention cache during inference.
-
The
appendmethod is crucial for adding new tokens to the cache, and it must be called in sequence to maintain the correct order of tokens.
Edge Cases
-
If memory allocation fails during construction (
$ffi->kvcache_createreturns NULL), a runtime exception will be thrown. -
Calling methods on an uninitialized or already freed
KVCacheinstance could lead to undefined behavior. Always ensure the cache is properly initialized and reset before use.
Performance Considerations
-
The KV cache is designed to minimize memory allocations and computations, making it efficient for real-time inference in transformer-based models.
-
The
appendmethod does not perform O(seq²) operations on scores, which could be a bottleneck in traditional attention mechanisms. This makes theKVCacheparticularly useful for large sequence lengths.