Recently we've received feedback from multiple customers with bad experience on location data of Czech republic and Egypt, so I've decided to analyze current logic of building location datasets and improve it to provide easier installation and overall better user experience and data quality. This was not about adding more rows for marketing numbers. It was about making country, region, and city trees trustworthy when someone publishes a listing or filters by location.
Updated packages are available on osclass-classifieds.com/geolocation and in Osclass backoffice under Market > Locations. Same dataset, refreshed generation rules, cleaner SQL output.
After refresh, package covers 252 countries, 3871 regions, and about 2.3 million cities with coordinates where source data allows map and distance features.
Support tickets were very specific. Czech sites showed regions sometimes in English, sometimes in Czech, sometimes half-transliterated labels that do not match official admin names. Egypt and other Arab markets had regions in Arabic but cities still looked like Latin transliteration in dropdown. UAE had cases where native label picked wrong script (Russian or Greek variant instead of Arabic). Algeria had many empty native values. Indonesia SQL was huge, much bigger than United States, which made import slow on shared hosting.

These are not cosmetic issues. Wrong location tree breaks trust in search filters, hurts SEO location pages, and creates extra admin work when moderators fix listings manually.
Location import still targets standard Osclass tables: t_country, t_region, t_city. Main visible fields remain s_name, s_slug, b_active, plus country phone and currency on country level. Coordinates stay on city rows (d_coord_lat, d_coord_long).
Important change: we now use s_name_native on country, region, and city - but only for countries where non-Latin script is part of normal local usage. If native value does not exist or would be duplicate noise, column is omitted from generated SQL for that row. Primary s_name stays readable for admin UI and Latin-friendly workflows.
Before refresh, same country could mix admin labels from different naming layers. User sees "Zlín" in region list but expects "Zlínský kraj". That happens when global region source wins over local admin naming.
We changed generation so region names are resolved per country from authoritative admin records first, then fallback only when needed. Result: more consistent local naming in region dropdown, less random English leftovers in otherwise local tree.
Earlier logic treated "first non-Latin alternate" as native. That fails in multilingual datasets. Example pattern we fixed: Arab country row contains Arabic, Russian, Greek, and other exonyms in same alternate list. Picking first match creates wrong native label.
New rule: for selected countries, native extraction prefers expected script family for that country (Arabic for Gulf/Levant/North Africa set, Cyrillic for Russia/Belarus/Ukraine/Bulgaria, CJK for China/Japan/Taiwan/Hong Kong/Macau, and so on). If preferred script exists, we use it. If not, we keep primary name only and avoid fake native column.
Indonesia was the extreme outlier. Too many micro-places with near-zero population were included, so SQL grew far beyond practical install size.
We applied stronger population filtering for large countries and country-specific threshold for Indonesia (minimum population 1 instead of global 5). This keeps meaningful places while cutting noise. On live package, Indonesia dropped from multi-megabyte class to practical size for normal hosting.
We fixed output cases where country had no regions/cities but SQL still printed empty statement tails (; only lines). Import tools and manual review are cleaner now. Batch refresh list is also synced with current country registry so deprecated country codes still get refreshed country row instead of staying stale forever.
Regeneration on shared hosting was too slow for small countries because preprocessing ran globally each run. We moved heavy steps to selected-country scope and added quiet batch mode. Practical effect: single-country refresh is usable again for support and iterative fixes.
| Market | Level | Before (typical issue) | After (expected behavior) | Why it helps |
|---|---|---|---|---|
| Czechia | Region | Mixed EN/CZ admin labels (e.g. city-style name in region list) | Consistent local admin region naming | Users recognize official kraj names in filters |
| Egypt | Region | Arabic region, Latin-only city feel | Arabic native on region/city where available | Better local UX for Arabic interface sites |
| UAE | City | Native sometimes Russian/Greek exonym | Arabic native preferred for AE | Avoids wrong script in bilingual markets |
| Russia / Belarus | Country, region, city | Native column missing or inconsistent | Cyrillic s_name_native aligned on all levels | Consistent azbuka display in RU/BY/Ukraine-style sites |
| Algeria / Morocco | City | Empty native or wrong script pick | Arabic first, Tifinagh where relevant | More complete Maghreb localization |
| Indonesia | City volume | Very large SQL, slow import | Reduced low-value micro-places | Faster install, lighter DB, same practical coverage |
If you maintain custom cities manually, document them before full re-import of same country. Replacement imports can overwrite generated rows depending on your workflow.
We ran structural SQL checks on random refreshed country packages: one country statement per file, valid statement order (country, then region, then city), no orphan semicolon blocks, balanced statement structure. Sample batch passed. This does not replace testing on your own server, but it caught real regressions from earlier generator output.
We also reviewed known edge cases from tickets: Czech region naming consistency, Arab script preference, Cyrillic consistency for Russia/Belarus/Ukraine, and Indonesia file size. Those were direct drivers of this refresh.
Import geography you actually operate in. Full-world import is rarely needed and often hurts performance on small servers. Better tree in 5-20 countries beats bloated global tree nobody uses.
For multilingual sites, keep theme/language packs aligned with location strategy: native column helps display, but UI translation and slug strategy still matter for SEO URLs.
This refresh came from real customer pain, not from abstract "data update day". We optimized generation rules, consolidated naming behavior, improved native script handling, reduced noisy city volume in Indonesia, and made SQL imports more predictable. If you still see wrong location label after import, send country code + level (country/region/city) + example label. That helps us tune rules with evidence, not guesses.
I will keep monitoring feedback from Czech, Arab, and Cyrillic markets specifically, because those were the strongest signals that previous logic was good enough for import, but not good enough for daily user experience 🙂