Consistent Hashing: An Overview and Implementation in Golang

Consistent Hashing

What Is Hashing?
What Is Consistent Hashing?
What Problem Does Consistent Hashing Solve?
How Consistent Hashing Work?
Where is Consistent Hashing Used?
Implementing Consistent Hashing in Golang
Conclusion

Ever wondered how big tech websites like Netflix, Twitch, Facebook, etc. handle huge amounts of traffic and data without slowing down or crashing? The technique that makes this possible is consistent hashing. As a developer, understanding consistent hashing is a useful skill in your toolbox.

In this tutorial, we'll give you an in-depth overview of consistent hashing and walk you through implementing it in Golang.

What Is Hashing?

Hashing is a technique used to map data of an arbitrary size to data of a fixed size. In simple terms, it maps keys to values. The values are called hash codes or hash values.

A hash function is used to generate the hash code. It takes the key and returns the hash value. A good hash function has the following properties:

Uniform distribution: It should map keys to hash values uniformly. Each hash value should have an equal chance of being the output of the hash function.
Deterministic: The same key should always map to the same hash value.
Efficiently computable: It should be fast to compute the hash value for any key.

Hashing is used for several purposes, like:

Hash tables: To map keys to values for faster lookups.
Cryptography: For message authentication and digital signatures.
Caching: To map keys to cached data.

What Is Consistent Hashing?

This is a special kind of hashing technique. It maps keys to hash values in a way that the mapping between keys and hash values remains consistent even when the number of hash values changes.

What Problem Does Consistent Hashing Solve?

Consistent hashing solves the problem of caching by randomly mapping data to physical nodes in the cluster. With a regular hash function, adding or removing one node changes the mapping of nearly every key.

Consistent hashing minimizes the number of keys that need to be remapped when a node is added or removed. It is useful in distributed caching and database systems where the number of nodes changes Frequently.

It also solves the problems caused by hash collisions. Hash collisions occur when two keys map to the same hash value. This leads to extra work to handle the collision.

How Consistent Hashing Work?

Hashing is a way of mapping data of arbitrary size to fixed-size values. A hash function takes input data and converts it into a hash value. The hash table stores the data using the hash value as the key.

Distributed hashing is an extension of hashing used in distributed data stores. It allows data to be distributed across multiple nodes.

0cdaeb48-921b-4f4d-bd4e-31bd7a6bcc0f Image by Abhinav Singh

In consistent hashing, the hash values are arranged in a ring. Each key is mapped to one of the hash values on the ring.

When a hash value is added or removed, only the keys mapped to that hash value are remapped. The other keys remain mapped to the same hash value.

The key's hash value is used to determine which node the data is stored on. As nodes add or remove from the cluster, the mapping of keys to nodes changes. This can require massive data movement.

Each node in the cluster receives one or more points on the ring. The system maps the key to a point on the ring and stores it on the node that contains that point.

When a new node is added, it takes over some points on the ring. Only the keys that map to those points move to the new node. The keys mapped to other points stay where they are.

Similarly, when a node is removed, its points on the ring are taken over by the remaining nodes. But keys mapped to other points are unaffected.

Visualizing Consistent Hashing in Action

Imagine a library with lots of books. Each book has a unique number. Consistent hashing assigns each book to a specific shelf in the library, based on its number.

upload_8fdb9e2fa21ad576da05fb07c800c629

This image shows how adding a new shelf (Node D) doesn't require moving every book. Only the books that were on the shelf now taken over by Node D need to be moved. The other books stay where they are.

This is the power of consistent hashing. It keeps data moving smoothly even when you change the number of servers (or shelves).

Where is Consistent Hashing Used?

Many real-world applications use consistent hashing to distribute data across a cluster of servers. Some major use cases include:

Distributed caching Consistent hashing is a popular technique for distributed caching systems like Memcached and Dynamo. In these systems, the caches are distributed across many servers. When a cache miss occurs, consistent hashing is used to determine which server contains the required data. This allows the overall cache to scale to handle more requests.

Distributed storage Distributed storage systems like Cassandra, DynamoDB, and Voldemort also use consistent hashing. In these systems, data is partitioned across many servers. Consistent hashing is used to map data to the servers that store the data. When new servers are added or removed, consistent hashing minimizes the amount of data that needs to be remapped to different servers.

Load Balancing Many people commonly use consistent hashing for load balancing in distributed systems. It allows you to add or remove nodes from the cluster without affecting too many keys. With a traditional hash function, adding or removing a node changes the mapping of nearly every key. Consistent hashing minimizes the number of keys that need to be remapped.

Peer-to-Peer Networks Some peer-to-peer networks use consistent hashing to map nodes to the key space. This allows nodes to join and leave the network with minimal disruption. New nodes can take over responsibility for keys that were previously assigned to nodes that left.

DNS The domain name system (DNS) uses consistent hashing to map domain names to DNS servers. This fault tolerance is provided by reassigning the keys of a DNS server to other servers with minimal changes if it goes down.

Implementing Consistent Hashing in Golang

To implement consistent hashing in Golang, you'll need to have the following installed

A code editor, VS Code Preferably
Golang
hashicorp/memberlist: go get -u github.com/hashicorp/memberlist
grpc-go: go get -u google.golang.org/grpc
protobuf: go get -u google.golang.org/protobuf/proto

Step 1: Define the Protocol Buffers (proto) file

Before we begin let's define the following terms:

Protocol Buffers (protobuf): Protocol Buffers, or protobuf, is a data serialization format developed by Google. It offers a language-agnostic way to define data structures, enabling efficient serialization and deserialization across different programming languages.

gRPC: gRPC, which stands for gRPC Remote Procedure Calls, is an open-source RPC framework, also developed by Google. It uses protobuf as its interface definition language, describing the structure of data exchanged between servers and clients.

Now, create a file named consistency.proto and add the following code:

syntax = "proto3";

package consistency;

service Node {
  rpc GetNodeForRequest (NodeRequest) returns (NodeResponse);
  rpc AddNodeForRequest (NodeRequest) returns (Empty);
  rpc RemoveNodeForRequest (NodeRequest) returns (Empty);
}

message NodeRequest {
  string key = 1;
  string node = 2;
  string ip = 3;
}

message NodeResponse {
  string node = 1;
}

message Empty {}

Run the following command to generate Go files from the proto file:

protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative consistency.proto

This command generates consistency.pb.go and consistency_grpc.pb.go.

Step 2: Implement Consistent Hashing and gRPC Service

Let's break down this step into three parts for better understanding.

Part 1: Initialization and Type Definitions

In this part, we initialize the HashRing struct, which represents the consistent hash ring. The struct includes fields for maintaining nodes, their corresponding IP addresses, keys, and a memberlist for handling membership in the distributed system.

package main

//import these packages
import (
	"context"
	"fmt"
	"hash/crc32"
	"log"
	"net"
	"sort"
	"sync"

	"github.com/hashicorp/memberlist"
	"google.golang.org/grpc"
)

// HashRing represents the consistent hash ring
type HashRing struct {
	sync.RWMutex
	Nodes    []string
	nodeToIP map[string]string
	nodeToKey map[string]uint32
	mlist    *memberlist.Memberlist
}

// NodeService represents the gRPC service for nodes
type NodeService struct {
	HashRing *HashRing
}

HashRing struct: It includes a mutex for thread safety, a slice to keep track of nodes, maps for storing node-to-IP and node-to-key mappings, and a memberlist for handling membership.
NodeService struct: It represents the gRPC service and holds a reference to the HashRing struct.

Part 2: The Functions

Here, we define the functions that perform actions on the hash ring, such as adding nodes, removing nodes, and getting the node responsible for a given key.

// AddNode adds a new node to the hash ring
func (hr *HashRing) AddNode(node string, ip string) {
	hr.Lock()
	defer hr.Unlock()

	if _, ok := hr.nodeToKey[node]; ok {
		return // Node already exists
	}

	key := crc32.ChecksumIEEE([]byte(node))
	hr.nodeToIP[node] = ip
	hr.nodeToKey[node] = key
	hr.Nodes = append(hr.Nodes, node)
	sort.Strings(hr.Nodes)

	// Update memberlist
	_, err := hr.mlist.Join([]string{ip})
	if err != nil {
		log.Fatalf("Failed to join memberlist: %v", err)
	}
}

// RemoveNode removes a node from the hash ring
func (hr *HashRing) RemoveNode(node string) {
	hr.Lock()
	defer hr.Unlock()

	if _, ok := hr.nodeToKey[node]; !ok {
		return // Node does not exist
	}

	delete(hr.nodeToIP, node)
	delete(hr.nodeToKey, node)

	// Remove the node from the slice
	for i := len(hr.Nodes) - 1; i >= 0; i-- {
		if hr.Nodes[i] == node {
			hr.Nodes = append(hr.Nodes[:i], hr.Nodes[i+1:]...)
		}
	}

	// Update memberlist
	_, err := hr.mlist.Leave(10 * memberlist.NodeMetaPreload)
	if err != nil {
		log.Fatalf("Failed to leave memberlist: %v", err)
	}
}

// GetNode returns the node responsible for the given key
func (hr *HashRing) GetNode(key string) string {
	hr.RLock()
	defer hr.RUnlock()

	hash := crc32.ChecksumIEEE([]byte(key))
	index := sort.Search(len(hr.Nodes), func(i int) bool {
		return hr.nodeToKey[hr.Nodes[i]] > hash
	})

	if index == len(hr.Nodes) {
		index = 0
	}

	return hr.Nodes[index]
}

// GetNodeForRequest returns the node responsible for the given gRPC request
func (ns *NodeService) GetNodeForRequest(ctx context.Context, req *NodeRequest) (*NodeResponse, error) {
	node := ns.HashRing.GetNode(req.Key)
	return &NodeResponse{Node: node}, nil
}

// AddNodeForRequest adds a new node for the given gRPC request
func (ns *NodeService) AddNodeForRequest(ctx context.Context, req *NodeRequest) (*Empty, error) {
	ns.HashRing.AddNode(req.Node, req.Ip)
	return &Empty{}, nil
}

// RemoveNodeForRequest removes a node for the given gRPC request
func (ns *NodeService) RemoveNodeForRequest(ctx context.Context, req *NodeRequest) (*Empty, error) {
	ns.HashRing.RemoveNode(req.Node)
	return &Empty{}, nil
}

AddNode, RemoveNode, GetNode: These functions handle modifications to the hash ring.
GetNodeForRequest, AddNodeForRequest, RemoveNodeForRequest: These functions are gRPC service methods for corresponding client requests.

Part 3: The Main Function

This part includes the main function where we set up the memberlist, create the gRPC server, register the service, and start the server.

func main() {
	// Example usage
	hashRing := &HashRing{
		nodeToIP: make(map[string]string),
		nodeToKey: make(map[string]uint32),
	}

	// Create and start memberlist
	config := memberlist.DefaultLocalConfig()
	config.Name = "localhost" // Set a unique name for the node
	list, err := memberlist.Create(config)
	if err != nil {
		log.Fatalf("Failed to create memberlist: %v", err)
	}
	hashRing.mlist = list

	// Create and register gRPC server
	server := grpc.NewServer()
	nodeService := &NodeService{HashRing: hashRing}
	RegisterNodeServer(server, nodeService)

	// Start gRPC server
	listener, err := net.Listen("tcp", ":50051")
	if err != nil {
		log.Fatalf("Failed to listen: %v", err)
	}
	defer listener.Close()

	log.Println("Server listening on :50051")
	if err := server.Serve(listener); err != nil {
		log.Fatalf("Failed to serve: %v", err)
	}
}

Initialization: We create a HashRing instance and configure a memberlist.
Server Setup: We create a gRPC server, register the NodeService, and start listening for connections on port 50051.

Create a file named consistency.go, and add the code snippets of the 3 parts into the file for the complete implementation.

Step 3: Build and Run

Run the following command to build and run the gRPC server:

go run consistency.go

This will start the gRPC server on localhost:50051.

Step 4: Test with a gRPC Client

Create a file named client.go with the following code to test the gRPC server:

In this code, we will add and remove two keys ("key1" and "key2") with corresponding nodes ("Node-A" and "Node-B").

package main

import (
	"context"
	"fmt"
	"log"

	"google.golang.org/grpc"
)

func main() {
	// Connect to the gRPC server
	conn, err := grpc.Dial("localhost:50051", grpc.WithInsecure())
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer conn.Close()

	client := NewNodeClient(conn)

	// Add nodes
	client.AddNodeForRequest(context.Background(), &NodeRequest{Node: "Node-A", Ip: "127.0.0.1"})
	client.AddNodeForRequest(context.Background(), &NodeRequest{Node: "Node-B", Ip: "127.0.0.1"})

	// Get node for key1
	key1 := "key1"
	response1, err := client.GetNodeForRequest(context.Background(), &NodeRequest{Key: key1})
	if err != nil {
		log.Fatalf("Error getting node for key1: %v", err)
	}
	fmt.Printf("Key '%s' belongs to Node: %s\n", key1, response1.Node)

	// Remove Node-A
	client.RemoveNodeForRequest(context.Background(), &NodeRequest{Node: "Node-A"})

	// Get node for key2
	key2 := "key2"
	response2, err := client.GetNodeForRequest(context.Background(), &NodeRequest{Key: key2})
	if err != nil {
		log.Fatalf("Error getting node for key2: %v", err)
	}
	fmt.Printf("After removing Node-A, key '%s' belongs to Node: %s\n", key2, response2.Node)
}

After running this code, your output should look like this;

Key 'key1' belongs to Node: Node-A
After removing Node-A, key 'key2' belongs to Node: Node-B

This output indicates the node assignment for each key before and after removing a node. The specific output will depend on the keys and nodes you choose during the execution.

Conclusion

Now you understand the basics of consistent hashing and have built a simple implementation of a consistent hash ring in Golang. Consistent hashing is a clever solution for mapping keys to nodes in a distributed system.

You've seen how it handles node additions and removals gracefully without disrupting the entire mapping. While this was just a basic example, the concepts you learned apply to more complex, real-world systems.

So keep practicing and before you know it, you'll be designing and building distributed systems that leverage consistent hashing to deliver scalable, highly available services.

Table Of Contents