Our team is working on a unique service that combines the capabilities of the Web Archive system (archive.org) and a search engine.
The experience in creating the Archivarix site recovery service allowed us to start working on something big.
We classify and index all retrieved data in order to make it convenient to search.
The data is not deleted and is stored in a convenient format for further processing.
Saved sites are technically static. Tools such as Archivarix CMS allow you to see and edit them as a single site, add a dynamic part, combine data from different sites and do the necessary optimization without having technical knowledge.
Starting with the launch of the Archivarix Site Restore Project in 2017, we have started collecting live site data in parallel.
We have collected and are collecting historical metrics of various site metrics and domain information since 2009, which we update every day.
The content of the sites that we process for full text search and content classification begins in 1996.
Our database contains information on historical data for more than 350 million domains.
The number of Spider and Archivarix processing servers involved already exceeds 50.
Our servers download over 100GB of website content from the Internet every day.
Every day we collect and analyze about 50GB of metrics data for domains and sites from various sources. Some of them are listed below.