The RAG Scraper supports optional AI-powered features designed to transform raw scraped data into structured, enriched, and analysis-ready insights. These enhancements help automate classification, interpretation, and advanced querying of your scraped content—making it even more valuable for Retrieval-Augmented Generation (RAG) systems, chatbots, or business intelligence pipelines.
1. AI Enhancements Overview
RAG Scraper includes an optional AI module that can:
-
Analyze scraped text and images
-
Categorize or tag results
-
Perform custom analyses via dynamic prompts
These enhancements are optional and configurable, offering flexibility for both lightweight use cases and advanced AI pipelines.
2. Why AI Matters in Scraping
Scraping collects data.
AI makes it usable.
Without AI:
With AI:
-
Content can be automatically structured, classified, or enriched
-
Enables faster integration into RAG workflows, chatbots, dashboards, or search engines
-
Supports semantic search, pattern recognition, and decision support
3. Setting Up AI Enhancements
To enable AI features:
-
Navigate to Settings → AI Enhancements
-
Toggle Enable AI Processing to “On”
-
Input your OpenAI-compatible API Key
-
Set your desired AI enhancement modules (see Section 5)
AI calls are made securely using your provided key and are never stored.
4. What is an API Key? How Do I Get One?
An API Key is a secure token that lets RAG Scraper communicate with external AI services (such as OpenAI, Anthropic, or custom endpoints).
To get an API key:
🔐 Keep your API key private. It provides billing-level access to your AI provider.
5. Configurable AI Modules
You can selectively enable one or more of the following enhancements:
✅ Nutritional Analysis
-
Extracts calorie count, macros, and ingredients from food-related data
-
Great for restaurant menus or meal directories
✅ Price Analysis
-
Evaluates pricing patterns, detects outliers, and calculates average values
-
Supports currency normalization and region-based adjustments
✅ Cuisine Classification
-
Uses NLP to classify restaurants or dishes into cuisine types (e.g., Thai, Mediterranean)
-
Outputs top cuisines with confidence scores
✅ Multi-modal Analysis
-
Accepts images alongside text (if scraped)
-
Uses vision models to infer dish type, decor, or menu layout
✅ Pattern Learning
-
Identifies repeating structures in semi-structured data (e.g., listing templates)
-
Learns to tag and separate repeating data for consistent output
✅ Dynamic Prompts
6. What is CONFIDENCE_THRESHOLD
and Why It Matters
The CONFIDENCE_THRESHOLD
is a numeric value (between 0.0 and 1.0) that determines whether an AI-generated result is included in the final output.
Example:
Why it’s important:
-
Reduces noise from low-confidence guesses
-
Improves data quality for downstream applications
-
Balances precision and recall depending on your use case
7. What are CUSTOM_QUESTIONS
and Why They’re Powerful
CUSTOM_QUESTIONS
allow you to define specific AI queries to run against every scraped item.
Examples:
Benefits:
-
Tailors the scraping process to your business logic
-
Turns unstructured data into insightful, task-specific results
-
Reduces the need for separate post-processing pipelines
You can define these questions as a list in the configuration panel. Each one will be interpreted and answered by the selected AI engine.
For full implementation support or enterprise customization, please contact support@AgileAIDev.com.