Storage Platform

Software-defined storage & resilience

Data is your most valuable asset.
Hardware failure is not a matter of if, but when.

The difference between data loss and data safety is not marketing claims or SLAs — it is architectural decisions made from day one. The whitesky storage platform is engineered for reality: disks fail, servers fail, networks partition, and maintenance must happen without downtime.

This page explains how whitesky delivers data safety by design, from high-level principles to the technical foundations underneath.


Engineering for reality, not best-case scenarios

Traditional storage platforms are often optimized for benchmarks and ideal conditions. whitesky starts from a different assumption: failure is expected.

Every architectural choice is shaped by this premise:

  • Safety takes precedence over raw efficiency
  • Scalability is built in, not retrofitted
  • Failure is isolated, not amplified
  • Recovery is automatic, not heroic

Architecture determines resilience. Hardware fails. Power fluctuates. Networks partition.
Your storage platform must handle these realities transparently.


whitesky storage at a glance

The whitesky storage platform is a distributed, software-defined system designed to withstand cascading failures that plague traditional infrastructure.

Even during:

  • disk failures
  • server outages
  • network partitions
  • planned maintenance

the platform maintains data availability and consistency.

Platform layers

  • Compute layer: Virtual machine orchestration and workload management with seamless failover.
  • Block storage layer: High-performance virtual disks with distributed transaction logging and cheap snapshotting.
  • Object storage backend: Erasure-coded data distribution across fault domains with automatic self-healing.
  • Backup layer: Independent snapshot architecture using S3-compatible, immutable storage.

Multiple layers of fault tolerance

Device-level protection: beyond RAID

Traditional RAID introduces single points of failure. whitesky uses erasure coding instead.

Data is split into fragments with calculated redundancy. These fragments are distributed across different physical disks.
Even multiple simultaneous disk failures do not result in data loss.

Server-level protection: distributed by design

Fragments are deliberately spread across different physical servers. No single server ever holds critical data.

When a server fails:

  • data remains accessible
  • fragments are automatically rebuilt
  • redistribution happens without manual intervention

For decision makers: losing disks or servers does not mean losing data.
Failure is routine — not catastrophic.


Storage blocks: independent failure domains

Instead of monolithic storage clusters, whitesky uses storage blocks.

A storage block is a small, independent failure domain:

  • 3 to 6 storage servers per block
  • always tolerates the loss of 1 full server
  • and multiple disk failures at the same time

Why this matters

  • Failure containment: Failures stay inside one block. There is no cascading “blast radius” across your entire cloud.
  • Predictable recovery: Each block has known recovery characteristics. No surprises during incidents.
  • Autonomous operation: Each block operates independently. Issues in one block never impact the operational state of others.
  • Multiple storage blocks combine into a full cloud location, enabling scale without sacrificing safety.

Scaling without compromising safety

Linear scale-out architecture

Capacity and performance scale by adding storage blocks. No re-architecture. No migration events. No redesign.

You can:

  • start with a single block for edge or small deployments
  • grow to dozens of blocks for regional data centers

Each block adds predictable capacity, performance, and fault tolerance.

Efficient storage economics

Erasure coding overhead can be as low as 33%.

This is dramatically more efficient than triple replication, which wastes 200% of raw capacity. You get enterprise-grade safety without hyperscaler-level storage waste.


Built-in data protection

Native backup integration

VM snapshots are stored directly in S3-compatible object storage. The backup layer is fully independent from the primary block storage layer.

If primary storage is impacted, backups remain intact and accessible.

Immutable snapshot design

Once written, snapshots cannot be modified or deleted through normal operational paths. This protects against:

  • accidental deletion
  • ransomware attacks targeting backup repositories

Rapid recovery

Because backup and storage are integrated, recovery does not depend on external systems. This dramatically reduces recovery time objectives (RTO).


Deployment models

Hyper-converged deployment

Compute and storage run on the same physical servers.

Best suited for:

  • smaller clusters
  • edge locations
  • cost-efficient regional deployments

Benefits:

  • lower hardware footprint
  • simplified operations
  • reduced capital investment

Disaggregated deployment

Dedicated compute nodes and dedicated storage nodes operate independently.

Best suited for:

  • large production environments
  • performance-sensitive workloads
  • independent scaling of compute and storage

This model simplifies failure handling and maintenance at scale.


Storage media configurations

Full flash

All layers run on SSD:

  • write buffers
  • distributed transaction logs
  • metadata services
  • erasure-coded storage layer

Delivers:

  • lowest latency
  • consistent performance
  • throughput bounded by network, not disks

Hybrid (flash + HDD)

Flash accelerates hot paths:

  • write buffers
  • metadata
  • cache

HDDs store cold data economically.

Delivers:

  • strong cost-per-TB efficiency
  • flash-like performance for active datasets
  • intelligent background data placement

Both configurations provide identical data safety guarantees.


Optional Security: protection against physical theft

If applied, all storage devices use encrypted filesystems. Encryption keys are stored in TPM 2.0 hardware modules.

There are:

  • no centralized key vaults
  • no shared secrets
  • no single point of compromise

Encryption is transparent to workloads and requires no application changes.

Security guarantee: Physical theft of disks does not result in data access. Without TPM-secured keys, stolen devices contain only encrypted fragments that cannot be reassembled.


Designed for real-world operations

Maintenance without downtime

Rolling upgrades allow software updates without service interruption. Servers can be replaced transparently while workloads keep running.

Self-healing architecture

Background agents continuously:

  • monitor health
  • repair fragments
  • rebalance capacity
  • verify data integrity

No manual intervention is required.

Operational simplicity

The platform absorbs complexity internally. Operators work with predictable states instead of emergency procedures.

This replaces heroic troubleshooting with calm, repeatable operations.


Technical deep dive (for architects & engineers)

Object storage backend

Core components:

  • OSDs storing object fragments on HDD or SSD
  • Arakoon clusters providing distributed consensus and metadata
  • Namespace managers tracking object locations
  • Stateless proxies for client access
  • Background maintenance agents for repair and rebalancing

Write path:

  • object split into fragments
  • erasure coding applied
  • fragments distributed across fault domains
  • metadata stored in namespace manager

Read path:

  • fragment locations resolved
  • missing fragments reconstructed automatically if needed

Redundancy policies define how many node and disk failures are tolerated.


Virtual block device layer

Virtual disks are exposed via a custom protocol.

Key characteristics:

  • log-structured object aggregation
  • cheap snapshots and clones
  • mapping between logical blocks and objects stored in metadata servers
  • distributed transaction log protects in-flight data

Each virtual disk:

  • is owned by exactly one volume driver
  • can fail over to another driver automatically
  • supports live migration during maintenance

Ownership fencing ensures split-brain conditions cannot corrupt data.


Why customers trust whitesky storage

  • Failure-aware architecture: Built from first principles to isolate, contain, and recover automatically.
  • Sovereignty and control: Transparent operation under your control, aligned with European data sovereignty.
  • Scale without compromise: From edge to data center with consistent safety characteristics.

whitesky does not avoid failure — it engineers for it.