Friday 17th May 2024
Ho Chi Minh, Vietnam

1. Bulkhead pattern and Resilient4J

Bulkhead pattern improves system resilience by creating separate, isolated compartments for different system parts. It’s inspired by the bulkheads on ships that prevent the entire vessel from flooding if one section is breached.

Resilience4j is a lightweight, fault-tolerance library for Java that helps developers build resilient applications. It offers various features to improve the stability and fault tolerance of applications by providing tools for handling and recovering from failures in a graceful manner including:

  1. Circuit Breakers: Helps prevent system overload and cascading failures by providing a way to short-circuit calls to a failing service.
  2. Retry: Allows developers to define and control how retries are performed when calling a potentially failing operation.
  3. Rate Limiting: Enables limiting the rate at which certain operations can be executed, preventing overuse of resources.
  4. Bulkheading: Isolates parts of the system to prevent failures in one component from affecting others.
  5. Timeouts: Helps in setting timeouts for operations to avoid waiting indefinitely for a response.

In this article, we will create a sample project which is an E-commerce system using an external KYC API to verify newly registered users. We will apply Bulkhead to the VerificationService using Resilience4j library to limit only a few concurrent requests to the registration feature when the KYC API is very slow.

2. Sample project

Tech stack: JDK 11, Gradle, Spring boot, Spring Feign, Resilience4j, Wiremock.

Bulkhead is applied to the VerificationService via the below configurations that means:

  • We allow a maximum of 5 concurrent requests to the verification service.
  • maxWaitDuration is for when we get any additional requests for rating service when the existing 5 threads are busy, we wait for only 2000ms and fail the request immediately.

application.yml

resilience4j.bulkhead:
  instances:
    verificationUserService:
      maxConcurrentCalls: 5
      maxWaitDuration: 2000ms

VerificationServiceClient.java

@FeignClient(value = "VerificationServiceClient", url = "${kyc-service.url}")
public interface VerificationServiceClient {
  @PostMapping(value = "/kyc/{userName}", produces = MediaType.APPLICATION_JSON_VALUE)
  String verify(@PathVariable String userName);
}

VerificationServiceClientFallback.java

@Service
public class VerificationUserService {

  private static final Logger LOGGER = LoggerFactory.getLogger(VerificationUserService.class);
  private final VerificationServiceClient verificationServiceClient;

  @Autowired
  public VerificationUserService( VerificationServiceClient verificationServiceClient) {
    this.verificationServiceClient = verificationServiceClient;
  }

  @Bulkhead(name = "verificationUserService", fallbackMethod = "verifyDefault")
  public String verify(String userName) {
    return verificationServiceClient.verify(userName);
  }

  public String verifyDefault(String userName, Throwable throwable) { return ApplicationUtil.FALLBACK_MESSAGE; }
}

To experiment with how Bulkhead works, we will write an integration test with a few scenarios. We will use Wiremock to mock the KYC API and add some delay to the KYC API response. We also use CompletableFuture to simulate concurent requests to the verification service.

private void simulateApiRequest() {
  CompletableFuture.supplyAsync(() -> verificationUserService.verify("thoai"));
}

VerificationServiceTest.java

@SpringBootTest
@EnableConfigurationProperties
@ExtendWith(SpringExtension.class)
@ContextConfiguration(classes = {Application.class})
@AutoConfigureWireMock(port = 8082)
public class VerificationServiceTest {

  @Autowired
  private VerificationUserService verificationUserService;
  ...

}

Scenario 1: the fallback method will be triggered (thank for Bulkhead) when KYC is slow (3000ms > maxWaitDuration = 2000ms) and too many concurrent requests (6 requests > maxConcurrentCalls = 5).

    @Test
    public void fallBackWillBeTriggered_when_kycApiIsSlow_and_tooManyRequests() throws InterruptedException {
    WireMock.stubFor(WireMock.post(WireMock.urlPathEqualTo("/kyc/thoai"))
      .willReturn(WireMock.aResponse()
        .withBody(ApplicationUtil.SUCCESS_MESSAGE)
        .withFixedDelay(3000)
        .withStatus(200)));

    for (int i = 1; i < 6; i++) {
      simulateApiRequest();
    }

    Thread.sleep(1000);
    String response = verificationUserService.verify("thoai");
    Assertions.assertEquals(response, ApplicationUtil.FALLBACK_MESSAGE);
  }

Scenario 2: the fallback method will not be triggered when KYC is slow (3000ms > maxWaitDuration = 2000ms) and but not too many concurrent requests (3 requests < maxConcurrentCalls = 5).

  @Test
  public void fallBackWillNotBeTriggered_when_kycApiIsSlow_but_notTooManyRequests() throws InterruptedException {
    WireMock.stubFor(WireMock.post(WireMock.urlPathEqualTo("/kyc/thoai"))
      .willReturn(WireMock.aResponse()
        .withBody(ApplicationUtil.SUCCESS_MESSAGE)
        .withFixedDelay(3000)
        .withStatus(200)));

    for (int i = 1; i < 2; i++) {
      simulateApiRequest();
    }

    Thread.sleep(1000);
    String response = verificationUserService.verify("thoai");
    Assertions.assertEquals(response, ApplicationUtil.SUCCESS_MESSAGE);
  }

Scenario 3: the fallback method will not be triggered when too many concurrent requests (8 requests > maxConcurrentCalls = 5) but the KYC is not very slow (1000ms < maxWaitDuration = 2000ms)

  @Test
  public void fallBackWillNotBeTriggered_when_tooManyRequests_but_kycApiIsNotSlow() throws InterruptedException {
    WireMock.stubFor(WireMock.post(WireMock.urlPathEqualTo("/kyc/thoai"))
      .willReturn(WireMock.aResponse()
        .withBody(ApplicationUtil.SUCCESS_MESSAGE)
        .withFixedDelay(1000)
        .withStatus(200)));

    for (int i = 1; i < 8; i++) {
      simulateApiRequest();
    }

    Thread.sleep(1000);
    String response = verificationUserService.verify("thoai");
    Assertions.assertEquals(response, ApplicationUtil.SUCCESS_MESSAGE);
  }

3. Conclusion

The exploration of Resilient Microservice Design through the Bulkhead Pattern underscores the critical role of compartmentalization in ensuring system stability and reliability. By isolating components and limiting the scope of failures, this design principle significantly enhances fault tolerance within microservice architectures.

This article illuminates how adopting the Bulkhead Pattern draws inspiration from naval engineering, translating the concept of compartmentalization into software design to fortify systems against disruptions using Resilient4J. Its emphasis on mitigating the cascading impact of failures underscores its value in maintaining operational continuity and resilience in microservice-based applications.

Overall, the Bulkhead Pattern stands as a foundational pillar in constructing robust and dependable microservice architectures, safeguarding against unforeseen challenges and bolstering the overall system’s resilience. The sample project can be found on GitHub.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top