Skip to content

What is Diffusion?

Diffusion (or in full, the "Diffusion Server") is a high performance publish / subscribe server that provides scalable real-time data streaming. The platform picks up where libraries like socket.io leave off, to address the complex architectural challenges that arise when operating at scale. Small Diffusion clusters are capable of handling hundreds of thousands of client connections in a performant manner.

Because Diffusion is WebSocket based, it is ideally positioned to be used at the edge of an enterprise application to service fast, high volume message transmission across the web. It can integrate with other messaging platforms, for instance extending Apache Kafka, to ingest data from within an enterprise and distribute it beyond.

Messages are commonly in JSON format, but the platform supports numeric, string, and binary message formats, as well as time series variations of each of these.

When Diffusion handles messages it applies intelligent compression, conflation, and delta-streaming algorithms to minimise bandwidth. Proprietary flow control and back pressure monitoring algorithms are further applied to prevent network congestion and maintain high data throughput in an elastic manner.

Although the edge example is one common application of Diffusion, it may be employed internally within an enterprise, or on any scale to service specific messaging needs. Developer-friendly SDKs exist to make integration very easy in a wide range of platforms and languages.

Data structure

Within Diffusion, data is represented in a topic tree - a hierarchical structure of message nodes or "topics". The structure is completely dynamic, scalable, and can be user defined to suit a particular use case.

Illustration of a very simple topic tree in Diffusion Server, showing a hierarchy of of topics

Data transformation

Illustration of a very simple topic tree in Diffusion Server, showing a hierarchy of of topics

Central to the server function is the ability to shape data to suit individual clients or categories of client. New topics can be generated dynamically based upon in-flight data. These reference topics are defined using a rich "topic view" DSL. Topic views can be used to extract, combine, or expose data in different ways.

Considerations and applications

Real time data streaming presents technical challenges when building an application, particularly if starting with core technologies. These challenges quickly become apparent as an application scales.

Diffusion aims to address these challenges in a developer-friendly manner.

Example applications

  • A financial trading platform pushing live market data to thousands of concurrent users
  • A live sports or gaming application streaming real-time updates to 100,000+ connected clients
  • A collaborative tool (a shared whiteboard or document editor) requiring instant synchronization
  • An IoT dashboard displaying sensor data from thousands of devices simultaneously

In each case there are technical and architectural factors to consider, in order to ensure suitable performance and scalability. Some examples...

  • Robust support for many connections, with finite server resources
  • Horizontal scaling, with state synchronisation across scaled nodes
  • Reliable message ordering and delivery
  • Support for direct point-to-point message transmission in addition to publish / subscribe
  • A security model, with access controls for different classes of client
  • Runtime monitoring and provision of detailed live metrics

To address any of these requires considerable design and development effort, and while platforms like Redis and Kafka offer pub/sub solutions, those are not optimised for the high concurrency data streaming situation.

The Diffusion platform steps into this space and offers solutions in each area.

Summary

The Diffusion feature set is designed to support the heavy lifting of real-time messaging and data streaming.

Key features

  • Rich in-flight message transformation. Allowing new topics to be dynamically derived, transformed, or combined based upon data in encountered in an inbound topic message (or categories of message).
  • Proprietary bandwidth and flow control technology. Message compression, conflation, and delta streaming technology, combined with interactive monitoring and flow control. Minimising bandwidth and maximising data transfer.
  • Very low latency. Measured in millisecond range, even under high volume.
  • Robust permission and access control model. Allowing very fine-grained control of data visibility (and API feature accessibility) to different classes of client.
  • Wide ranging platform and language support. (Please refer to the SDKs menu to the left.)
  • Flexible procurement and deployment options. The Diffusion Server can be run natively 'on premise' in linux, mac, windows or dockerised environments, in a single server or clustered environment. A hosted offering is also available in Diffusion Cloud - minimising set up time, and providing a browser based control console.