Listings Crawler Plugin

The Listings Crawler Plugin is a powerful automation tool for Osclass that enables you to extract and store classified listings from external websites directly into your marketplace database or JSON files. Save countless hours of manual data entry by automatically crawling product listings, real estate ads, vehicle classifieds, or any structured content from the web and preparing it for import into your Osclass site.

This plugin requires a solid understanding of HTML and CSS to properly configure and build crawlers. Ensure that your server can access the target URL, as some sites may have security measures that prevent crawling.

Plugin does not perform import into Osclass! Objective of plugin is to prepare data into JSON or Database storage for further processing.

Key Features

Intelligent Content Extraction

CSS Selector-Based Mapping: Precisely target and extract any data field using CSS selectors with visual page structure analysis
Dual Crawling Modes:

Follow Links Mode: Extract from search/listing pages, then crawl individual item detail pages separately
Direct Extraction Mode: Grab all data directly from a single page when complete details are available

Multi-Source Support: Use comma-separated selectors for fallback options (OR logic) to ensure data capture
Smart Image Handling: Automatically detects and extracts multiple images per listing (both src and data-src attributes)
URL Analysis Tool: Built-in analyzer to examine page structure, test URL accessibility, and identify CSS selectors before crawling

Advanced Field Mapping (25+ Data Fields)

Extract and map comprehensive listing data including:

Core Information: Title, description, locale/language
Pricing Details: Price amount, currency with customizable delimiter parsing
Visual Content: Multiple images with thumbnail and full-size support
Contact Information: Name, email, phone with privacy visibility controls
Location Data: Country, region, city, city area, ZIP code, full address
Categorization: Automatic category assignment or default values
Temporal Data: Publish date, expiration date with flexible format support
Unique Identification: Automatic URL-based unique ID generation for deduplication

Flexible Configuration Options

Static Default Values: Set fallback values for any field using simple double-quote syntax (e.g., "For Sale", "US", "[email protected]")
Category & Location Intelligence: Automatically assign categories by name or ID, or use default mappings
Currency & Price Parsing: Built-in delimiter support for accurate price extraction from combined price/currency strings
Item Wrapper Filtering: Ignore specific page elements (like related items or user boxes) to avoid data conflicts
Relative Selector Evaluation: All selectors evaluated relative to item wrapper for precise targeting

Automated Workflow & Scheduling

Cron Integration: Schedule automatic crawling runs via Osclass cron jobs for hands-free operation
Smart Deduplication: Update existing listings or skip duplicates based on unique URL-generated identifiers
Full Refresh Mode: Option to completely replace old data with fresh crawls on each run
Configurable Limits: Control items per run (recommended: up to 50) and total storage capacity (auto-cleanup of oldest items)
Update vs. Skip Logic: Choose whether to update existing items with new data or skip them entirely

Server-Friendly Operation

Adjustable Request Delays: Set pause intervals (100-2000ms recommended) between HTTP calls to respect target servers
Custom User Agents: Configure browser mimicking to improve compatibility and avoid blocks
Custom Headers: Add authentication or special headers via JSON configuration for protected content
Rate Limiting: Built-in protections to prevent server overload on both source and target systems
Server Accessibility Testing: Verify your server can access target URLs before setting up crawlers

Flexible Storage Options

Database Storage: Store crawled items in dedicated database table (t_crw_item) for structured access
JSON File Storage: Alternative file-based storage in oc-content/uploads/crawler for portability
API Access: Secure API key system for programmatic data extraction and integration
Retention Management: Automatic cleanup of oldest items when storage limits are reached
Structured Data View: Browse extracted listings with searchable table including ID, title, category, contact info, images, and fetch date

Contact Data Management

Smart Email Generation: Automatically generate random emails from extracted domains (e.g., extract @gmail.com and create [email protected])
Privacy Controls: Configure email and phone visibility (public/private, yes/no, 1/0)
Fallback Contacts: Set default contact information when none is found on source pages
Email Domain Extraction: Intelligent domain detection for generating valid email addresses

Built-in Analysis & Testing Tools

URL Analysis Feature: Test any URL before crawling to verify:

Server accessibility (HTTP status codes, error detection)
Response data quality (text extraction validation)
Complete page structure with CSS selector hierarchy
Element counts and direct text content
Selector identification for easy mapping

Visual Selector Browser: See all available CSS selectors on target pages with counts
Fetch Validation: Detect 4xx/5xx errors, "Forbidden" responses, and security blocks before setup

Data Quality & Validation

HTML Sanitization: Automatic HTML cleanup and conversion to clean text
Date Format Parsing: Supports Y-m-d and Y-m-d H:i:s timestamp formats
Unique ID Generation: MD5 hash-based unique identifiers from URLs for reliable deduplication
Multi-Image Support: Extract all available images (12+ images per listing supported)
Locale Support: Multi-language listing extraction capability

Perfect For

Marketplace Aggregators: Build comprehensive classified platforms by extracting from multiple sources
Content Migration: Transfer listings from old platforms to prepare for Osclass import
Competitor Analysis: Monitor and extract competitor listings for market research
Real Estate Portals: Aggregate property listings from various sources into centralized database
Automotive Marketplaces: Extract vehicle listings automatically (cars, motorcycles, etc.)
Job Boards: Crawl and aggregate job postings from multiple sites
Price Comparison Sites: Extract product data with pricing for comparison engines
Data Warehousing: Store classified ad data for analytics and business intelligence

How It Works

Configure Crawler: Set up crawler with target URL and extraction mode (follow links or direct)
Map Fields: Define CSS selectors for each data field you want to extract (title, price, images, etc.)
Test & Analyze: Use built-in URL analyzer to verify server can access target and identify selectors
Set Schedule: Configure cron for automatic runs or execute manually from admin panel
Monitor Extraction: View extracted items in structured table with all data fields
Access Data: Use stored database records or JSON files for import into Osclass listings

Technical Specifications

Supported Formats: HTML pages with structured content accessible via HTTP/HTTPS
Selector Engine: Standard CSS selectors (advanced pseudo-selectors like :has, :not, ~, + not supported)
Selector Keywords: "this" keyword to reference item wrapper element itself
Data Validation: Automatic HTML sanitization and text conversion for clean data
Date Parsing: Supports Y-m-d and Y-m-d H:i:s formats, falls back to current timestamp
Image Formats: Extracts both src and data-src attributes (lazy loading support)
Locale Support: Multi-language listing extraction with locale field mapping
Storage Formats: MySQL database tables or JSON file storage
API Integration: RESTful API with key-based authentication

Use Cases

Automated Extraction: Schedule nightly crawls to keep your database fresh with new listings
Bulk Data Migration: One-time extraction of large listing databases from external sources
Competitive Intelligence: Track changes and updates from competitor websites over time
Multi-Source Aggregation: Combine listings from dozens of sources into single database
Data Enrichment: Supplement existing data with additional fields from external sources
Market Research: Collect pricing and inventory data for analysis
Archive & Backup: Create regular snapshots of external listing data

Crawler Management Interface

Multiple Crawlers: Create and manage unlimited crawlers for different sources
Crawler List View: See all configured crawlers with ID, name, URL, and status
Individual Settings: Each crawler has independent configuration for fields, limits, and scheduling
Items Overview: Browse all extracted items across all crawlers in unified table
Detail View: Examine complete extracted data for individual items including all 25+ fields
Easy Editing: Modify crawler configuration anytime without losing extracted data

Requirements

Osclass 8.x or higher
PHP 7.4 or newer with cURL support
MySQL database access for database storage mode
Server with cron job capability (for automated scheduled runs)
Write permissions to oc-content/uploads/crawler (for JSON storage mode)
Target websites must be accessible from your server (no JavaScript-required pages or Cloudflare-protected sites)
Server must support outbound HTTP/HTTPS requests

Limitations & Important Notes

JavaScript-rendered content cannot be crawled (server-side HTML only)
Sites with Cloudflare protection or similar security services may block requests
Advanced CSS selectors (:has, :not, ~, +) are not supported
Plugin extracts and stores data only - separate import step required to create actual Osclass listings
Recommended maximum of 50 items per crawl run for optimal performance
URL must be accessible from server backend (client-side JavaScript cannot be executed)

Transform your Osclass marketplace into a powerful data aggregator with the Listings Crawler Plugin. Extract thousands of listings from any website and store them ready for import with just a few CSS selectors.

Last update of product description has been on 28. April 2026

Click to load video

Product features and functionality

Basic documentation included

Require PHP skills

Coding skills recommended

Recommended for advanced osclass users

Advanced installation (need more skills)

No dependency on 3rd party services

MB Themes

Premium developer

221 products

View seller profile

Product support includes

Direct support from Adrian Brezak, founder of MB Themes and developer maintaining these products in production

12 months access to support and latest updates

Support can be extended anytime for 35% of base price (+12 months)

Availability of seller to answer questions

Answer technical queries about product features

Assistance with reported bugs or issues

Help with installation in case there is problem

Product in English language (other locales provided by community)

Proven support scale: 9,200 resolved tickets and 47,000 support messages

Long-term maintenance track record: 2,200+ updates released across products

Updates are based on customer support cases, Osclass core changes, PHP/MySQL updates, and real-world usage feedback

Public support reputation: see verified customer reviews on Trustpilot

Support does not include

Customization service, custom work or feature requests

Support on free/gratis plugins delivered with premium themes

Installation service

Translation and localization services

Support quality, trust and engineering proof

Seller updated this product 2 times

Seller rating is 4.7 of 5 - Excellent (583 reviews)

Member since 2017

Support available in:

English

Czech

Slovak

This product is not compatible with WordPress. All our themes and plugins work exclusively with Osclass.

✨ Ask Fred – Instant AI Support for Osclass Go to support forums Download osclass

Frequently asked questions

Question: What does Listings Crawler Plugin do in a real classifieds workflow?

Answer: The Listings Crawler Plugin is a powerful automation tool for Osclass that enables you to extract and store classified listings from external websites directly into your marketplace database or JSON files.

Question: When is Listings Crawler Plugin the right choice?

Answer: It is useful for teams that prefer a tested implementation path over ad-hoc custom development The setup details for Listings Crawler Plugin are different in production.

Question: Which setup step is most important for Listings Crawler Plugin?

Answer: Enable core options first, then validate main user flow and admin settings save cycle before enabling advanced features The setup details for Listings Crawler Plugin are different in production.

Question: How should compatibility be checked for Listings Crawler Plugin?

Answer: Validate plugin behavior after Osclass core updates and PHP upgrades, then review changelog-dependent configuration changes The setup details for Listings Crawler Plugin are different in production.

Question: Can Listings Crawler Plugin affect page performance?

Answer: Monitor load time and database queries on pages affected by plugin hooks, then optimize configuration based on real traffic patterns The setup details for Listings Crawler Plugin are different in production.

Question: What are common issues with Listings Crawler Plugin?

Answer: Common causes are missing prerequisites, cached outdated settings, and conflicts with custom forms or third-party overrides The setup details for Listings Crawler Plugin are different in production.

Question: What is the recommended migration path for Listings Crawler Plugin?

Answer: Update in controlled steps, retest primary business flow, and keep rollback package ready before production deployment The setup details for Listings Crawler Plugin are different in production.

Changelog - Product updates history

Fixed labels in settings.
Corrected unique id generation when follow links is set to off.
It is now possible to turn off/on md5 hash of unique id and use santized url instead (constant in index).
Many minor improvements.

Initial plugin release

View all products updates

Verified & Genuine Reviews

All reviews on OsclassPoint come from real customers who have purchased the product. Only verified buyers can leave a rating or review.

To maintain quality and accuracy, every review is moderated before being published.

No reviews has been added yet.

Add a new review

€ 49.99

Created by best developers

Regular updates and bug fixes

Premium support services

Add to cart ✨ Ask Fred – AI Support Agent

Price is in Euros

MB Themes

I am Adrian Brezak, founder of MB Themes and developer of Osclass plugins and themes for classifieds platforms. I focus on maintaining and improving compatibility, payment integrations, SEO features, performance, spam protection, and marketplace monetization across releases. 9,200+ support tickets resolved · 47,000+ customer messages handled. 2,200+ product updates and compatibility fixes. Trustpilot profile