Select the search type
  • Site
  • Web
Search

RAG Scraper: AI Enhancements Documentation

The RAG Scraper supports optional AI-powered features designed to transform raw scraped data into structured, enriched, and analysis-ready insights. These enhancements help automate classification, interpretation, and advanced querying of your scraped content—making it even more valuable for Retrieval-Augmented Generation (RAG) systems, chatbots, or business intelligence pipelines.


1. AI Enhancements Overview

RAG Scraper includes an optional AI module that can:

  • Analyze scraped text and images

  • Categorize or tag results

  • Perform custom analyses via dynamic prompts

These enhancements are optional and configurable, offering flexibility for both lightweight use cases and advanced AI pipelines.


2. Why AI Matters in Scraping

Scraping collects data.
AI makes it usable.

Without AI:

  • You get raw HTML or plain text

  • Post-processing is manual and error-prone

With AI:

  • Content can be automatically structured, classified, or enriched

  • Enables faster integration into RAG workflows, chatbots, dashboards, or search engines

  • Supports semantic search, pattern recognition, and decision support


3. Setting Up AI Enhancements

To enable AI features:

  1. Navigate to Settings → AI Enhancements

  2. Toggle Enable AI Processing to “On”

  3. Input your OpenAI-compatible API Key

  4. Set your desired AI enhancement modules (see Section 5)

AI calls are made securely using your provided key and are never stored.


4. What is an API Key? How Do I Get One?

An API Key is a secure token that lets RAG Scraper communicate with external AI services (such as OpenAI, Anthropic, or custom endpoints).

To get an API key:

🔐 Keep your API key private. It provides billing-level access to your AI provider.


5. Configurable AI Modules

You can selectively enable one or more of the following enhancements:

Nutritional Analysis

  • Extracts calorie count, macros, and ingredients from food-related data

  • Great for restaurant menus or meal directories

Price Analysis

  • Evaluates pricing patterns, detects outliers, and calculates average values

  • Supports currency normalization and region-based adjustments

Cuisine Classification

  • Uses NLP to classify restaurants or dishes into cuisine types (e.g., Thai, Mediterranean)

  • Outputs top cuisines with confidence scores

Multi-modal Analysis

  • Accepts images alongside text (if scraped)

  • Uses vision models to infer dish type, decor, or menu layout

Pattern Learning

  • Identifies repeating structures in semi-structured data (e.g., listing templates)

  • Learns to tag and separate repeating data for consistent output

Dynamic Prompts

  • Define your own AI instructions using prompt templates

  • Example:

    Summarize the top 3 user complaints in this review text. 

6. What is CONFIDENCE_THRESHOLD and Why It Matters

The CONFIDENCE_THRESHOLD is a numeric value (between 0.0 and 1.0) that determines whether an AI-generated result is included in the final output.

Example:

  • A cuisine classifier returns:

    • Italian: 0.85

    • Mexican: 0.22

  • If CONFIDENCE_THRESHOLD = 0.75, only “Italian” will be included.

Why it’s important:

  • Reduces noise from low-confidence guesses

  • Improves data quality for downstream applications

  • Balances precision and recall depending on your use case


7. What are CUSTOM_QUESTIONS and Why They’re Powerful

CUSTOM_QUESTIONS allow you to define specific AI queries to run against every scraped item.

Examples:

  • “Is this product vegan?”

  • “What is the average customer sentiment?”

  • “Does the description suggest the product is handmade?”

Benefits:

  • Tailors the scraping process to your business logic

  • Turns unstructured data into insightful, task-specific results

  • Reduces the need for separate post-processing pipelines

You can define these questions as a list in the configuration panel. Each one will be interpreted and answered by the selected AI engine.


For full implementation support or enterprise customization, please contact support@AgileAIDev.com.