Blog

Brad Shervheim Exclusive Interview with AI Press Room

Polar Search Founder Bradley Shervheim Exclusive Interview with AI Press Room

AIPressRoom

https://aipressroom.com/

Exclusive Interview with CEO / Founder / Chief Scientist

Jeff CMO, AIPressRoom

  • Founders from YC, Techstars & A16Z-backed companies have participated in these interviews

Section 1: Founding Story & Industry Challenges1. What inspired the creation of Polar Search, and what problem were you aiming to solve?

Polar Search is a natural evolution of 17 years and three different attempts at venture backed startups. The name Polar Search was coined from Polar Harmonic Transforms, the concept of converting an image into a harmonic wave in the polar coordinate system and using that to compare product images. Today I am narrowly focused on creating AI for Commerce. The goal is to develop a suite of tools for sellers to automate grouping product images, identifying the products in the images, finding and reading tags, categorizing products, and creating product manifests from raw product images, no human input. How I achieve this goal is by training neural networks to develop the intuition to understand seller intentions with no human labels. This has been an arc of creation that started in 2008 with my first angel backed startup. What set me on this AI journey has humble beginnings in trying to identify auto parts for sale online using images, and barcodes. I raised $200k for that startup called GreenYard Auto. We had initial commercial success reaching over $500k in revenue in 2011, but the AI industry was nascent and identifying auto parts in real time was technically impossible in 2012. 10 years later, after the advent of LLMs I realized machine learning had reached a state of maturity that could make this task possible for the first time, I launched an AI company Shopik in 2023 which could take a photo of a retail product, and identify it for ecommerce sellers. This company achieved $100k in revenue in the first year, this was the market validation I needed to pursue Polar Search. The key learning from that second startup Shopik was that even if I can identify the products using an image, the major bottleneck is processing the images in bulk. There is a 2 step process in inventory management 1) select the product photos 2) describe the product photos. Polar Search addresses the manual labor involved in the former: selecting and sorting the billions of retail photos taken annually. It is a complimentary business and natural evolution to the two prior startups I had.

2. How does Polar Search address efficiency challenges in e-commerce and auction cataloging?

Human labor is a major expense for ecommerce listings worldwide. The manual process of dragging photos into Amazon, eBay, or Facebook marketplace is a universal challenge across industry verticals, from small companies to massive corporations. Take Ross Dress for Less, 60 Million products across 2000 stores: almost no inventory online. This is due to the enormous overhead of image processing, and labeling required. It's no surprise that even a local yard sale is offline, the burden of photograph sorting, and description writing is nonsensical for a temporary sale that has fast moving inventory and low prices. Polar Search's core AI for Commerce is delivered via "product intuition". Understanding the concept of a "sale" or a "product" is the key issue at hand. This cannot be done with hand-crafted algorithms or clever heuristics. Rather this requires advanced neural network architecture, and a dedicated team with the solo focus to bring these technologies to market. Polar's AI is trained on vast volumes of retail images, videos, and descriptions. This purpose-built AI is dedicated to understanding the "commercial intent" of the seller, not simple "object detection" that saturates the market today.

When a person takes photographs of products in a sale, they have the intuition to demonstrate an aspect of an item for sale, it is always contextual and requires inferring across more than next nearest neighbor image. The seller explicitly photographs various angles and zoom details for every product image in every sale. For example:  a seller take a photo of a macbook on a table in front of a wall, next to the power cord. A person can naturally understand the computer is for sale in an image, not the table it sits on, and the power cord is an included accessory. All modern AI lacks these subtle cues because it is not trained with commercial intent. The inefficiency in eCommerce comes from the highly dimensional complexity of grouping and identifying product inventory. The "easy part" is acquiring lots of inventory for sale, as evidenced by my first startup where I had over 100,000 products. The hard part is rapidly creating an inventory manifest from scratch, and then making use of that digital inventory to market and advertise these products online. As much as 10% of all online revenue goes toward the inventory management process. Polar Search aims squarely at automating the manual labor tasks associated with sorting new photos of retail products and identifying the retail products and their labels in the images. By focusing on this bottleneck we offer immediate value to any client that is onboarding new inventory to sell online.

3. What were the biggest obstacles in developing AI-driven product clustering and identification?

Context windows in AI are too short: aka you can only upload a few images. Additionally LLMs like ChatGPT are generalists and Vision CNN/Transformer models like SWIN are don't have a "product" focus. I largely underestimated the challenge at hand in 2008, and again in 2023, but not in 2025, I have licked my wounds and sharpened my pencil. I am building foundational AI models for Commerce, not just using other generic models. I started with complex heuristics, then I transitioned to fusion models, and now I am embarking on ground breaking large neural networks. This AI from the ground up is specifically trained on Commerce images exclusively. ChatGPT (or other LLMs) cannot do the task at hand, because context windows today are limited to 100,000-200,000 tokens, aka up to 10 images maximum. Additionally, training as a generalist means that multimodal LLMs lack the "commercial intuition" required to be reliable for commerce. A typical client sale may have 10,000 new product photos, and they are all related. This is a massive "quadratic scaling" challenge. Polar Search is developing neural networks that tackle this challenge by developing extremely long context windows for high volumes of high resolution images. This requires that I stretch AI further than it has ever gone before. As a byproduct of tackling this enormous scaling challenge I have begun publishing my key findings in public journals, and our blog. I want to make sure I share this progress with the world, and whenever possible I am going open source with our models with full source code. This code will have a dual use license to allow academics to have unlimited access while restricting commercial use via license. For those highly proprietary elements I am also publishing papers to allow the world to benefit with us. My current research focuses on extremely long context windows up to 40 million tokens at inference. Given this goal, I am hard at work defining a new paradigm in reverse differentiable sparsity with gradient flow back through an affinity matrix/attention matrix. This requires rethinking the foundations of both transformers and graph neural networks. If you are passionate about fully differentiable gradient flow through attention mechanisms please email me at hello@polarsearch.io

4. How has your experience in the Techstars Los Angeles Accelerator influenced Polar Search’s growth?

TechStars LA powered by JP Morgan has been a rocketship powered catalyst. The peer teams are inspirational, the leadership is second to none, and the experience is humbling. Even with my overwhelming expectations coming into the program, I was leveled by the caliber of peers I collaborated with on a daily basis. In addition to the human capital, TechStars' partnership with Google means that I was able to access an additioanl $100,000 in Google Cloud platform credits. I currently have access to state of the art H100 GPUs for training our proprietary models thanks to TechStars. Without TechStars backing these credits would have never been possible. Matt Kozlov is a visionary leader, and I am grateful for his leadership while at TechStars Los Angeles powered by JP Morgan. Section 2: Technology & Business Impact5. Can you explain how Polar Search’s AI-powered image recognition and clustering system works?

The core technology today focuses on clustering the same product images from a live stream of 10,000 input images by leveraging a fusion model powered by a graph neural network. I started with 10 parallel Vision and Language transformers to build a set of affinity matrices to examine the edge weights between each product image (node). These affinity matrices are then combined with a GNN to fuse these outputs into a single output affinity matrix, and train with ground truth via BCE binary cross entropy. I have over 30,000 product images already in clustered sets. By training this model to recognize sets of product images, the graph network discerns buyer purpose and commercial intent inherently. The result? This will allow clients to separate a stream of 10,000 product images into product sets: for example images 1-5 are the same product, 6-13 are the same product. This first model focuses exclusively on this sorting and clustering task. Product identification and a suite of other capabilities in the Polar Sales Platform will be coming next. Salesforce's BLIP2 employs a similar fusion model, a neural network to combine 2 locked models (VIT and LLM). I have taken the fusion model concept and pushed it much farther. The initial release planned includes the following advanced technology baked into this model: reinforcement learning and mixture of experts, reinforcement learning gating on the fusion model output prior to mixture of experts, and a primary gate trained to "know which model to choose" upstream of the 10 parallel models. These innovations allow the model to leverage the best VITs and multi-modal LLMs but at a fraction of the cost. I additionally plan to include a randomly initialized model which will use distillation learning to back propagate the "gaps" into its latent space to make up for the short falls of the other Vision models. Other key innovations include continuous embeddings to pass in the concept of "seller", "sale", "quality" and even "budget". The intention is to allow a seller to define a set dollar budget for this task, and the model is trained to execute the maximum quality result while holding the budget in focus. Training this model includes cutting edge loss and reward functions that call on more than just the ground truth examples. I am releasing a series of white papers on Medium, Substack, Github, Arxiv, and Papers With Code in the coming weeks. The goal is to open source as much as possible to develop a community around this technology. I also plan to share my core discoveries around Graph Neural Networks, Sparse Attention via Differentiable Attention Gradients, and Transformers. I hope to develop a vibrant engineering and data science team that is motivated to solve the hardest AI challenges. Please check out our blog for regular technology updates. https://www.polarsearch.io/blog please free to contact hello@polarsearch.io for more information.

The models have successfully clustered and identified products with high precision and reliable confidence scoring, and I have proved that modeling "seller intent" is possible. During this research and development another surprising insight has emerged: general intelligence. It turns out to train a model to have a human intuition around product sales, it can also develop broad general skills as well, beyond clustering and identification of products. This leads me to share the big news of my intention to train a multi-modal large language model dedicated to commerce. The momentum is barreling toward releasing a comprehensive model that a seller can talk to in plain english, and can answer nuanced product questions including:- "take a look at these 5,000 product images, print out a sale manifest that I can upload into the auction site, and then put the images into folders with labels so I can easily upload them to Amazon, eBay, and my own website"

This model will be ground breaking, and will leverage an unsupervised graph neural network graph auto encoder and graph attention network, with reconstruction loss. The executive functions on top of this GAE GAT will include: clustering, and pruning. Additionally I am planning a fully connected layer between the vocabulary token embedding matrix and the first transformer attention block, allowing sparse attention to back propagate gradients through the attention heads/affinity matrices. This will enable significantly longer context windows to allow more images per inference. By applying aggressive pruning on the forward pass, and full differentiability during backward propagation (including through the sparse attention routing gate) we will see dramatic improvements in performance over state of the art LLMs today on the market. One of the keys to the commercial success of this new large model will be successfully implementing an upstream MLP with soft connection into the attention layer to allow full gradient flow upstream of the attention head. Through aggressive forward pass pruning, we can develop a method for gating sparsity before the quadratic attention, thus allowing a linear scaling for even the longest context windows. We see this coming to fruition in the form of larger product images, and more product images per set powered by longer context windows. Our audacious goal is to feed in 40 million image patch tokens on a single forward pass and have high confidence on the general inference across all products in the set. It took me an entire year to label, and inventory 20,000 car parts for sale online. My goal is to create a platform that can do the same work in a few seconds.

6. What makes Polar Search’s technology unique compared to traditional manual product sorting?

Manual sorting is a process required by all sellers worldwide for all ecommerce. Even if the seller has a barcode system, they still need to import and integrate that data. Polar Search unlocks all commerce image data by allowing the seller to scan the image, not the barcode. Product lookups, barcode research, and inventory systems all lack the core ability to use the images itself as the core identifier, not the identification number.

7. How do you see AI-driven automation reshaping product management in e-commerce and retail?

The world is currently experiencing a trillion dollar shift in ecommerce. Due to advancements in AI, I now expect that "all" items for sale will be online, not just the expensive or easy to list items. This will necessitate that an entire digital sales economy emerge overnight. I anticipate that within the next 5 years technology will advance beyond images so that a seller could make a video of an entire estate sale, upload the media file, and in seconds have a live auction online, with no human input. Companies like Ross Dress for Less, and Nordstrom Rack will re-attempt inventory digitization not to sell online, but to list local inventories online. Polar Search plans to be one of the primary technology companies facilitating this online migration for all the items that are still offline due to supply chain bottlenecks and manual data entry requirements.

8. What are the main benefits for businesses implementing Polar Search’s technology?

Increased revenue and increased velocity (inventory turns). I originally built this business intending to reduce labor and overhead expenses for sellers. The unexpected discovery is that clients choose to keep all employees and typically add more headcount to grow revenue versus shrinking labor costs. Additionally clients have fun using the tech. It's really impressive to watch an AI scan and sort a vast collection of product images in real time. Finally, the significant reduction in labor required to list items online facilitates dramatic increase of online inventory. Section 3: Personal Insights & Growth Journey9. What have been the biggest challenges in scaling Polar Search, and how did you overcome them?

Acknowledging that the data science research needs to be decoupled from engineering. Re-aligning the core focus of the whole company to narrowly concentrate on scientific research was the biggest breakthrough which has now been overcome.

10. AI-powered product discovery can face concerns about accuracy and bias—how does Polar Search ensure reliability and fairness?

The beauty of our graph based approach "product graph" is that bias and accuracy can be audited by examining the edges and weights that contributed to the decision making process. This is a huge advantage for our GAT GAE technology over competitors that are "black box". Our technology stands up to bias and even allows for oversight and deep audit capability around decisions and "why it chose what it did". Best of all we can train and modify traits of the model based on our oversight.

11. What does success look like for you and Polar Search in the next five years?

Our long term success hinges on our ability to provide reliable confidence scores with our Commerce for AI product suite. Hallucinations are a prevalent reality in AI, and we must allow our sellers to trust our responses. Success is providing quality product sorting technology to sellers with 99% accuracy, and reliable flags when the 1% is incorrect. These real-time audit capabilities will define our ability to succeed over the next 5 years.

12. What does a typical day look like for your team, and how do you stay motivated?

Up very early in the morning, white board marker in hand, exploring visual pathways to push the envelope of neural network modeling. Prioritizing implicit differentiability and soft lossiness over compression/reductive hard projections to allow gradients to flow, even at the expense of explicit relationships, and pushing the boundary on abstract meaning. The key is to infer the relationships of relationships without being able to "see" the nodes and edges in an intuitive way. I am looking at secord order relationships as a primary objective. Throughout the day converting these raw ideas into Jupyter Notebook python scripts to test and validate new expressive concepts to run on our A100s. Active research throughout the day to be aware of all new releases on arxiv related to computer vision. The motivation comes from the momentum, and sheer capabilities of the technology but also the novelty of solving something new and incredibly technically challenging. Typical day includes training and inference on A100/H100 GPUs (40GB, 80GB, and MEGA). Having access to such comprehensive GPUs is thrilling, and allows rapid innovation and success. Each day unlocks new and unforeseen results, it is an exciting time to be alive, science has found a new mystery to crack, and I am ambitiously participating in my niche to contribute to the rising tide of AGI for mankind. As I crash late at night, in front of the same marker board, the black expo market still in hand, I am inspired, in awe, and can't wait to start the next day again.