SamKnows Shield.

SamKnows Shield

Our servers, data collection, caching, analytics cluster and centralised management tools.

Global platform

The SamKnows Shield consists of three parts: our globally distributed and highly available SamKnows One Global Platform which provides services such as our web servers, data collection, caching and our analytics cluster; our test servers; and our Internal Platform Management Infrastructure which consists of our centralised management tools such as monitoring, authentication, deployment and server management services.

Test Servers

Test servers are simple endpoints for the SamKnows measurement agents to conduct tests against. There are currently around 500 test servers deployed worldwide. These are intended to be deployed near to the measurement agents to allow for maximum throughput and minimum latency measurements.

More information on test servers can be seen in its dedicated documentation.

SamKnows One global platform

The SamKnows One Global Platform is a series of servers spread across the globe running our core services. Critical services such as web servers, data collection servers and metadata databases are replicated across the globe in order to provide good geographical locality and low latency. We also have a small number of big data clusters spread geographically for latency and availability/resilience. Almost all of our Global Platform is on bare metal servers, although some lightweight services such as our web servers are on virtual machines. We have long-term agreements with data centers across the globe that give us access to use high-powered servers at a low cost. With extensive internal capacity planning, automated provisioning of new services and our strong relationships with our data center/server providers across the globe, we have found using bare metal both more cost efficient and more performant than cloud services.

Databases

SamKnows One is powered by two main data stores, built on two very different platforms.

Firstly, a MySQL 5.7 relational database is used to power a global metadata store. This holds information about the measurements measurement agents (e.g. Whiteboxes, router apps, mobile apps, web apps), users, access rights, test schedules and much more. This database scales according to the number of measurement agents and users deployed on the platform. Whilst it is a large database (numbering hundreds of millions of records), it is very small in comparison to the measurement database. It is also small enough to reside on traditional database servers and allow for real-time interaction. This MySQL cluster consists of around 15 servers for resilience, redundancy and low latency to services across the globe.

The measurement database is where all the test results are stored from all measurement agents globally. An extremely large volume of measurement data is generated daily, and this is only expected to increase. This volume of data warrants a different type of database to be used for the storage and querying. SamKnows uses Hadoop as a distributed data store for the measurement data, and a distributed query engine called Presto (developed by Facebook) to allow us to work with the data. Each cluster has a large internal fault tolerance and we also maintain multiple clusters for latency, maintenance and resilience purposes. More information on our big data platform can be found here.

This approach is both extremely scalable and performant for this use case. The use of Hadoop allows us to scale the measurement data store horizontally, simply by adding more servers to the cluster.

Web servers

Our web servers, located geographically acting almost as a CDN, power the SamKnows One web application, and all of our various internal and external APIs that are available to clients or work to power SamKnows One.

SamKnows One is made available to end users via a web application. This is hosted globally on web servers in four continents, ensuring low latency and fast response times to users, no matter where they are located. The web application’s frontend is written in HTML5, and utilises the Vue.js JavaScript library. The frontend communicates with the backend solely by making API calls to endpoints written in PHP 7.1. All web services are delivered over HTTP and secured with TLS 1.2. We use gdnsd is used to handle geographic load balancing and failover.

We also have separate web servers that solely serve analytics requests that result in metric data queries. These are hosted on more powerful machines that can handle streaming much larger amounts of data and are located with our analytics clusters.

Data collection servers

Data collection servers (DCS) are the gateway between the routers and the backend SamKnows infrastructure. Routers and Whiteboxes report their test results to the DCS and retrieve configuration updates from these servers.

These servers are deployed by SamKnows in key geographies, typically near to the Whiteboxes and measurement agents that they will be interacting with. The data collection servers are hosted globally by SamKnows, and additional data collection servers may be deployed in key markets at customers’ requests or where network topology requires it. SamKnows uses an hourly reporting and heartbeat interval to the SamKnows infrastructure as standard. This value is configurable.

Data Collection Servers also manage updates and test schedule management (using LMAP) for routers and Whiteboxes.

Data is processed and transferred from Data Collection Servers through our data ingestion pipeline into our SamKnows One Big Data clusters.

Internal platform management infrastructure

In order to support the SamKnows One Global Platform we have extensively automated provisioning services and servers through extensive use of puppet to allow us to add new machines quickly; they are also centrally monitored using our Nagios monitoring system and managed/deployed to using puppet and CICD platform. Together this tooling allows us to constantly review utilisation for capacity planning and provisioning of new servers which is a very smooth experience.

Puppet management

SamKnows controls its server fleet centrally using a popular open-source management tool called Puppet (https://puppetlabs.com). Puppet allows the SamKnows infrastructure team to easily manage hundreds of test and infrastructure servers ensuring that all servers are kept up to date with patches and correctly configured to requirements. The backing of this uses a git repository allowing for full version control and change visibility. Access to the puppet repository requires either two-factor authentication or private key ssh authentication. Developed in Python, Puppet uses a low-overhead agent installed on each server that regularly communicates with the controlling SamKnows server to check for updates and ensure the integrity of the configuration.

This method of managing our test servers allows us to deal with the large number of test servers without affecting the user’s performance in any way. We are also able to quickly and safely make changes to large parts of our test server fleet while ensuring that only the relevant test servers are updated.

Nagios monitoring

All of our internal infrastructure and SamKnows controlled test servers are monitored by our Nagios monitoring system. More information on this can be found in our platform monitoring documentation.

Redundancy, resilience and failover

SamKnows configures all of its servers and services in a fault-tolerant manner. In some cases, this is achieved at the application layer and in other cases we rely on load balancing techniques (either hardware load balancers or DNS based load balancing). However, SamKnows does require that multiple distinct instances of each server exist in separate physical locations in order to be able to guarantee the resilience of the platform. All of our critical services have provider-redundancy and geo-redundancy.

Failover in almost all situations is automatic in order to reduce any downtime but the SamKnows infrastructure team are always alerted to issues in order to restore redundancy. For more information please see our documentation on platform monitoring.

More information on redundancy of individual services is mentioned throughout infrastructure documentation.

Infrastructure hosting

SamKnows stores data on servers located in PCI DSS compliant datacentres. Access is restricted to a limited number of authorised SamKnows employees.

Core infrastructure servers are currently deployed in the United Kingdom, France, Germany, Sydney, Singapore, the United States, and Canada. We divide our services into geographic zones so that users interact with the most local services, with automatic failover occurring to other regions (using geo-DNS) in case of outages of a service in an entire geographic zones (which have internal redundancy).

Data is not stored at rest outside of the EU or GDPR 'third-countries' and all data is encrypted at rest and in transit.