Friday 17th May 2024
Ho Chi Minh, Vietnam

1. What is Resiliency in Micro-services?

Resiliency in microservices refers to the system’s ability to maintain functionality and recover quickly from failures or disruptions. Since microservices architecture involves breaking down an application into smaller, independent services, ensuring resiliency becomes crucial to maintain overall system stability.

In essence, resiliency in microservices involves a combination of design principles, architectural choices, and implementation strategies to ensure that the system can withstand failures and continue functioning as expected or with minimal disruption.

We will go through some popular patterns to achieve resiliency in microservices via some scenarios in an E-commerce application implemented by the Java Spring framework, this application depends on the Verification Service which verifies newly registered users.

2. Circuit breaker

Circuit breaker is a pattern that wraps requests to external services and detects when they fail. If a failure is detected, the circuit breaker opens. All the subsequent requests immediately return an error instead of making requests to the unhealthy service. It monitors and sees the service that is down and misbehaves with other services. It rejects calls until it becomes healthy again.

Circuit breaker supports the notion of a fallback: a default code path executed when the circuit is open OR there is an error (response is not 200-300 or timeout). The circuit breaker opens when it detects problems with a service or resource, stopping further requests from reaching it to prevent potential harm or system overload. There are multiple thresholds used to determine when to open the circuit breaker as well as when to close it again based on the implementation of each library.

Below is how we apply Hystrix library to the Verification Service to enable the circuit breaker with the Fallback method supported:

VerificationServiceClient.java

@FeignClient(value = "VerificationServiceClient", url = "${kyc-service.url}", fallback = VerificationServiceClientFallback.class)
public interface VerificationServiceClient {
  @PostMapping(value = "/kyc/{userName}", produces = MediaType.APPLICATION_JSON_VALUE)
  String verify(@PathVariable String userName);
}

VerificationServiceClientFallback.java

@Component
public class VerificationServiceClientFallback implements VerificationServiceClient{

  @Override
  public String verify(String userName) {
    return ApplicationUtil.FALLBACK_MESSAGE;
  }
}

You can learn more about Hystrix Circuit Breaker via this article.

3. Bulkhead pattern

Bulkhead pattern improves system resilience by creating separate, isolated compartments for different parts of a system. It’s inspired by the bulkheads on ships that prevent the entire vessel from flooding if one section is breached.

Let’s consider a scenario in our E-commerce application: for some reason, the Verification Service is very slow and we get multiple requests to register new user accounts. Then the performance of the system will be affected because too many threads are blocking not only the registration feature but also the other features such as searching for products or marking an order.

Now, you can realize the problem here, why don’t we isolate resources / allocate some threads for the registration feature? Then this feature will not consume all the threads of the system and prevent it from hanging for all the requests.

We will apply Bulkhead to the VerificationServiceClient using Resilience4j library as below. Let’s say our system can handle 100 threads, we want to allow only a maximum of 30 concurrent requests to the registration feature even when we have 100 threads.

VerificationServiceClient.java

@FeignClient(value = "VerificationServiceClient", url = "${kyc-service.url}")
@Bulkhead(name = "verificationService", fallbackMethod = "fallback", type = Bulkhead.Type.SEMAPHORE)
public interface VerificationServiceClient {
  @PostMapping(value = "/kyc/{userName}", produces = MediaType.APPLICATION_JSON_VALUE)
  String verify(@PathVariable String userName);
}

public String fallback(String userName) {
  throw new IllegalArgumentException("Verification Service is not available");
}

application.yml

resilience4j.bulkhead:
  instances:
    verificationService:
      maxConcurrentCalls: 3
      maxWaitDuration: 10s

The maxWaitDuration configuration is for when we get any additional requests and the existing 30 threads are busy, we wait for only 10 ms and fail the request immediately.

You can learn more about Bulkhead via this article.

4. Rate limiter

Rate limiter controls the number of requests or operations a system can handle within a specified time frame. It’s employed to prevent excessive traffic, abuse, or overloading of resources, ensuring fair usage and stability of the system.

Let’s imagine our E-commerce platform cooperates with an affiliate platform that helps us to onboard new users, to deal with this requirement, we will need to expose a RESTful API that trigger a RegistrationService to register new a user account that will be consumed by the affiliate platform. To avoid malicious requests from the affiliate platform, we only allow calling this API at 5 requests per second.

RegistrationService.java

public class RegistrationService {
  @RateLimiter(name = "registrationUser")
  void registrerUser(UserRegistrationDto userRegistrationDto ) {
    ...
  }
}

application.yml

resilience4j.ratelimiter:
  instances:
    registrationUser:
      limitForPeriod: 5
      limitRefreshPeriod: 1s
      timeoutDuration: 1s

The limitForPeriod and limitRefreshPeriod configurations together determine the rate (5 requests per second). The timeoutDuration configuration specifies the time we are willing to wait to acquire permission from the RateLimiter before erroring out.

5. Retry pattern

Microservices operate in distributed environments where interactions between services happen over networks, which introduces possibilities of failures such as network issues, service unavailability, timeouts, or temporary resource constraints. The retry pattern addresses these transient failures by automatically retrying failed requests or operations within a certain time frame, aiming to achieve a successful outcome eventually.

For example, we can apply a retry pattern when calling the Verification Service using Spring Retryer as below.

VerificationServiceClient.java

@Service
public class VerificationService{

    private final VerificationServiceClient verificationServiceClient ;
    private final RetryTemplate retryTemplate;

    @Autowired
    public VerificationService(VerificationServiceClient verificationServiceClient , RetryTemplate retryTemplate) {
        this.verificationServiceClient = verificationServiceClient ;
        this.retryTemplate = retryTemplate;
    }

    @Retryable(value = {Exception.class}, maxAttempts = 3, backoff = @Backoff(delay = 3000))
    public String verify(String username) {
        return retryTemplate.execute(context -> verificationServiceClient .verify(username));
    }
}

In the above example, we have annotated the method with @Retryable, which indicates that this method should be retried if an exception is thrown. We have also configured the maximum number of attempts to 5 and a delay of 1000 milliseconds (1 second) between each retry.

6. Conclusion

Resiliency patterns like circuit breakers, rate limiting, bulkheads, and retries are indispensable tools in fortifying microservices architectures, ensuring reliability, fault tolerance, and overall system stability. The strategic use of these resiliency patterns within microservices architecture provides a robust shield against potential failures and disturbances, contributing significantly to the overall resilience and reliability of distributed systems.

By implementing these patterns, microservices can better withstand unforeseen challenges, maintain system integrity, and ensure a consistent and dependable user experience despite varying conditions and disruptions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top