Blog

Making Intractable Possible: 10,000 Image Context Window - Polar Search Product Roadmap

Four Phases to Product Development at Polar Search

Fusion, MOE, RL, GNN, Sparsity with Gradient Flow, Neomorphic
Four Phases to Product Development at Polar Search

Our goal is to build AI for Commerce. We are taking a crawl, walk, run approach by releasing a series of models specifically designed to help sellers! 

Polar Search, AI for Commerce is designed from the ground up to offer 100 images per inference today in Fusion V1, and also plans to allow 10,000 images per inference when we release our Fusion V2 coming soon. This offers the ability to input a large set of images to create a full eCommerce manifest with * products separated into images sets* product identification* product category* CSV manifest creation* live price lookup

When we launch our LLM later this year we plan to allow a 40M token context window which can accommodate up to 25,000 images per batch.

Fusion V1: The First AI for Commerce Model 100 Images

  • 10 VIT and LLM fusion: Combines popular AI models into a single fusion via GNN 
  • VIT Models, LLM and OCR: SWIN, VIT, DINO, Color, GPT, Claude, Gemeni, Gemma3, Phi4, NovaPro, InternVL OCR
  • Context window up to 100 images: 10x more than current LLM technologies, Polar Search allows significantly longer context windows for retail product images
  • Outputs Product Clusters: In seconds group the most nuanced products together in to individual product folders
  • 100k Parameters: Our fusion GNN is trained on 10 locked models, and creates a "product intuition" by examining the affinity matrix produced by the 10 VIT features

Fusion V2: Ultra Long Context AI for Commerce 10,000 Images

  • Parallelizes and Expands V1: Allows for 100 parallel sets of 100 images to accommodate 10,000 images inference in real time
  • Expanded Vision Models: Mixture of Experts, Gating, Embeddings for Seller, Sale, Budget, and Quality, BCE, RL, GAT, GAE, Unsupervised Reconstruction Loss
  • Context window up to 10,000 images: 1000x more than current LLM technologies, Polar Search allows significantly longer context windows for retail product images
  • 50M Parameter: Pushing up the technical capabilties is this "fully baked" version of our ground breaking GNN Fusion model
  • Continuous Learning: LORA, Distillation, Retraining
  • Sparsity RL: gradient flows upstream through sparse attention matrix, allowing super long image context windows and full attention across all images input
  • Graph Attention: Introducing graph attention sparsity model, to minimize the number of tokens using RL trained channels

Multimodel LLM for Commerce: Mega Context AI for Commerce 40M Tokens

  • Ultra Long Context Windows: Reaching to the sky for 100M tokens and 100,000 images in full attention inference in seconds
  • Expanded Vision Models: Mixture of Experts, LORA, BCE, Rewards and Loss, RL, Long Context, Sparsity RL, Transformer, MLP Vocab, GAT, GAE
  • Introducing Text: Transformer based architecture connects to the advanced "front end" established in Fusion V2
  • Vocabulary MLP: To store the depth of 60,000 token attention in a multidimensional latent space
  • Vocabulary Channels: Separate channels to allow directed graph affinity to capture State, Distance, Ownership
  • Graph Attention: Example 1: "The Lion is next to the tree on fire and is roaring" vs Example 2: "The roar came from inside the cave" 
    • Example 1: 
      • Vocab Channel 1: Verb Noun Relation
        • Lion <- Roaring (directional graph edge)
        • Tree <- Fire (directional graph edge)
      • Vocab Channel 2: Physical Distance
        • Lion - Tree (scalar graph edge)
      • Vocab Channel 3: Contains
        • no edges - padding
    • Example 2: 
      • Vocab Channel 1: Verb noun relation
        • no edges - padding
      • Vocab Channel 2: physical distance
        • roar - cave (scalar graph edge) 
      • Vocab Channel 3: Contains
        • cave <- roar (roar is inside cave)
  • Fully Differentiable: RL defined channels (text features) to reduce the attention through parallelism
    • 4 Parallel channels to capture deeper text relationships

Neomorphic Chips: Continuous Learning with Resistance

  • Reduced cost: Printing our LLM onto a Hala Point Intel Chip enables extreme cost efficiency when running the model
  • Continuous Learning: Using resistance in physical chips can allow parameter spaces to be protected from catastrophic forgetfulness
  • Speed: Inference times will be dramatically reduced by eliminating the need for GPUs, chips with the LLM hardcoded enable lightening fast inference