Skip to main content

Evaluating AI Search Providers

· 6 min read
taika
Creator of Token Radar

I evaluated several AI search providers, including the new OpenAI Web Search API, to compare them against my current provider Perplexity.

Introduction

AI-powered search is revolutionizing how developers access information and build intelligent applications. Unlike traditional search engines that rely solely on keyword matching and link analysis, AI-powered search understands context, can synthesize information from multiple sources, and provides more relevant results for complex queries.

With the growing number of providers in this space, choosing the right one for your specific needs can significantly impact both your user experience and your budget.

As a developer building Token Radar, a site that provides information about various cryptocurrencies, I needed to find the most effective AI search solution for generating accurate and up-to-date crypto-related content.

TL;DR

Here's a quick summary of my findings:

ProviderBest ForCost/1kKey AdvantageMain Drawback
Openperplex GPT-4o-MiniGeneral Use$12Best quality/cost ratio
Perplexity Sonar ProPremium Needs$14.65Most accurate resultsHigher cost
Linkup StandardBudget Option$5Great valueNo search filters
OpenAI Web Search, ExaNot Recommended$5-40Good documentationExpensive, lower quality

For Token Radar's specific use case, I decided to use Linkup Standard for most tasks and Openperplex GPT-4o-Mini for questions where up-to-date information is crucial.

I will keep Perplexity Sonar Pro as a backup.

Key Players in the Evaluation

The evaluation included several providers:

  • Perplexity: Sonar and Sonar Pro Models [link]
  • OpenAI: Web Search API [link]
  • Linkup: Standard and Deep Models [link]
  • Openperplex: Various Models [link]
  • Others: Exa, Critique Labs, and Jina

Price Ranges

The pricing landscape for AI search providers varies dramatically, ranging from as low as $5 to as high as $2,000 per 1,000 requests.

Here's how the major providers stack up:

Provider (different models)Cost (USD per 1,000 requests)
Perplexity$5-15
OpenAI$25-40
Linkup$5-50
Openperplex$12-96
Exa$5-15
Critique Labs$0.125
Jina$2,000

Test Methodology

I used three different types of questions, that all have a practical use case in my app Token Radar, to evaluate the providers:

  1. FAQ Questions: Questions about specific tokens to generate automated FAQ content for token pages.
  2. Preview Questions: Questions about tokens that are not yet released or may never be released.
  3. Price Increase Questions: Questions about tokens that have had significant price movements.

To test the providers, I created a simple Python script for each provider and each question type. The script outputs the results as markdown files for later analysis.

Python script for evaluating AI search providers.

Rating System Explained

For each test, I evaluated responses using two methods:

  1. AI Rating (1-3 scale): I used Claude Sonnet 3.7 to analyze responses based on accuracy, relevance, and completeness. A score of 1 indicates poor quality, while 3 represents excellent quality.
  2. Manual Rating (1-3 scale): I personally reviewed each response, considering factors like factual accuracy, usefulness for the intended purpose, and clarity of information.

I also calculated the cost per 1000 requests for each provider and considered this in my final rankings.

Results Breakdown

Preview Questions

On Token Radar, I also list tokens that are not yet released or may never be released. For these tokens, I ask two key questions:

  1. Will there actually be a token released?
  2. Is there an airdrop that users can participate in?

These questions can be challenging for search providers since many of these newer tokens are very niche and not well documented.

Checkout the Dolomite token page for a live example.

Preview Questions

Key findings from preview question testing:

  • Linkup Standard led with perfect AI scores and great value ($5/1k)
  • OpenAI and Perplexity Pro tied for accuracy but at higher costs ($10-35/1k)
  • Openperplex models performed well but were less consistent
Preview Results

Jina's deep-research model showed perfect scores but came with significant drawbacks:

  • Very slow processing
  • Very Expensive at $2000 per 1k requests

Note: I eliminated models from further testing that significantly underperformed or where to expensive before moving on to other question types.

FAQ Questions

For this test, I asked various questions about specific tokens to generate automated FAQ content for token pages. The questions covered topics like the use case and technical details that users commonly want to know about a token.

Checkout the Bitcoin token page for a live example.

FAQ Questions

The standout performer was Openperplex's GPT-4o-Mini model, achieving consistent high scores (3/3) across both AI and manual evaluations. Perplexity Sonar Pro also performed well, though at a higher cost point ($14.65 per 1k requests).

FAQ Results

Price Increase Questions

For significant token price movements, I want to understand the underlying causes. This type of query presents a unique challenge because only recent events are relevant - historical price movements from months or years ago don't help explain current volatility.

Price Increase Questions

The final test revealed:

  • Openperplex's GPT-4o-Mini and Linkup Standard maintained their strong performance
  • Most providers struggled with consistency in this category, showing varying performance between AI and manual ratings
  • Perplexity and Openperplex were the only providers that provided filters to only show recent events, giving them an advantage in this category
Price Increase Results

Conclusions

Based on the comprehensive evaluation:

Best Overall Value: Linkup Standard

  • Good performance
  • Reasonable pricing ($5 per 1k requests)
  • Good balance of quality and cost

Best Performance: Openperplex with GPT-4o-Mini

  • Top rankings in multiple categories
  • Competitive pricing ($12 per 1k requests)
  • Consistent quality across different question types

Premium Option: Perplexity Sonar Pro

  • More expensive but with reliable performance
  • Good for cases requiring premium quality and the most up-to-date information
  • Strong recency filters for time-sensitive queries

Final Thoughts

For Token Radar, I've implemented a hybrid approach:

  1. I will use Linkup Standard for the FAQ and Preview questions, because it's the most cost effective and performs well.
  2. For the Price Increase questions, I will use Openperplex GPT-4o-Mini because it provides up-to-date information.
  3. I will also implement strategies to minimize costs while maintaining freshness of information.

Remember that your specific use case might require different trade-offs between cost, performance, and specific features like recency filters. I recommend running your own small-scale tests with your specific use cases before committing to a provider.