Recently I have been working on project for one of our large clients to bring content from multiple websites of their partner stores into Osclass. Let us call him client Global Marketplace, and partner websites Green Store, Blue Market, and Red Shop. On paper it looked simple. In real world, this was one of those jobs where small details can break everything if you rush 🙂.
In this article I share what really worked for us when using Listings Crawler Plugin together with Ad Importer Plugin (CSV, XML, JSON). I also include mistakes we made and what I would do differently if I start same migration again.
Short result: in this setup we brought around 100000 listings within 2 days, while keeping front office responsive and without negative impact on normal customer traffic.
Global Marketplace had existing categories, custom fields, location tree, and active sellers. Partner websites had different structures, naming standards, and quality of data. One website had price in separate field, another had price mixed with currency string, third one had missing location data in many records.
If we imported everything as-is, the Osclass site would become messy very quickly. Search relevance would be weak, category pages would contain mixed quality records, and support team would spend too much time on manual cleanup.
So our first decision was important: crawler should not be used as "blind collector". Crawler had to produce structured and normalized source dataset, and importer had to apply strict rules before any listing goes live.
I know many people want to start directly with big volume. We did opposite. First we created pilot with 300 listings from each partner source. This pilot helped us verify:
This pilot saved us.

We found two critical issues early: duplicated records caused by URL parameters, and broken image URLs from lazy-load attributes.
For crawling we used follow-links mode in most cases, because listing pages and detail pages had different structures. We also tested direct extraction mode on one partner source where all fields were already visible in list page cards.
Main practical rules we applied:
One very important thing: crawler plugin prepares data, it does not import directly into Osclass listing table. This separation was very useful for us because we could inspect data before publishing anything.
source_url -> unique_id
headline -> title
content_html -> description
price_text -> price + currency parser
city_name -> city (with fallback region)
category_label -> category map table
For category and location mapping we maintained simple lookup rules outside source websites naming. This means "Cars", "Vehicles", and "Auto" from different partners all ended in one expected Osclass category.
After crawler output looked good, we moved to Ad Importer. Importer was configured as strict quality gate. I think this is where many migrations fail, because they treat importer only as connector.
Our import profile included:
We intentionally did not run one massive import call for all records.

Instead we used controlled waves. First wave imported around 10000 records, then next wave after validation, then larger batches when confidence was better.
People ask me if 100000 listings can be imported safely. Yes, but only with discipline. Biggest risk is not importer itself, but side effects: image downloads, search index update cost, and cache invalidation spikes.
What helped us keep site stable:
We also monitored frontend response time every 5 minutes. If time started to rise too much, we reduced next batch size. This dynamic approach was better than fixed giant jobs.
Big import projects are not only about data. They are about operating system, database, cron timing, and business priorities at same time.
No partner source was perfect. Some records had too short titles, many had duplicated template text, some had old phone numbers, and many images had weak quality. If we pushed this directly to production, user trust would drop very fast.
Practical fixes we used:
Another thing: partner websites changed HTML layout two times during project. Because of that, we added a quick selector health check before every larger crawl run. That small check prevented silent failures.
For Google visibility, quantity alone is not enough. Imported content must still satisfy quality and intent. Thin duplicate pages can hurt the whole project if you publish blindly.
What we did for better search quality:
The biggest SEO win was not "more pages". Biggest win was cleaner structure and better relevance in search result pages, unique and high quality content. This improved both organic sessions and conversion quality.
I also recommend these two videos for practical setup flow and real examples:
If I need to summarize this migration in one sentence: success came from controlled process, not from one magic setting. We treated crawling and importing as data engineering workflow, not as one-click action 🚀.
For Global Marketplace project, this approach gave us stable scale, better listing quality, and much less manual admin work after launch. If you are planning similar project, start small, verify everything, and grow in waves. It is slower in first week, but much faster in long run.