Handling external API errors: A transactional approach

https://thoughtbot.com/blog/handling-external-api-errors-a-transactional-approach · scraped

![](https://prod-files-secure.s3.us-west-2.amazonaws.com/871f1661-80b8-4d0c-ac3b-2adfc6ff4c66/8dbbefc8-77a2-459e-b508-6cc513b6538d/social-share-default.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAZI2LB4665OCZODF2%2F20260214%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20260214T173944Z&X-Amz-Expires=3600&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEEaCXVzLXdlc3QtMiJHMEUCIQDogVE%2B4BdTUMTqGvx%2BqZAseaGAwHedCXxRssql9%2Bm0AAIgU%2FZsKjaOxBmVawX3P9KYtVWmlI9xIevq%2BUR6cXMHmdgq%2FwMIChAAGgw2Mzc0MjMxODM4MDUiDG7jRZv3Y2V1RngAVSrcA4SxCIXvCpFcjCXH5cBEouJhdDrr3AqMOdrOs7QtKsmebLkkGCBlOXLbyLlQL%2FqYYhvzit7mCix%2B8in7Xn%2BFjirefX8QhevCiczNYk9hyQwEIyhTMNSJTS4iTyAPWCcwcbr%2FQAaZrIes4ukwVmqzGWfZwzp4TjmNRPzuIqTZEd%2FrS1a1OlLprHp6QieVJ5qheOi%2Bvb%2BEfuIHA2KNASC%2BZNvrJIyl6oFW7Vu5yVwIu1Z6ublr3KXqTynKhtmwYl%2BlFavf2CjvC2XMMBb8i4plvOlHhkehAGKWjYzZrfXzpOQAitAyIr0dSI%2BedaiSXHNcr%2BzDN7s4RNqEuLaqeKK0xgIQ73pf%2FxKWOqRjdVXEqGjnLXHUlIdPtLx0J3wNpx%2B9olJisUYOauYJG6AxjF6c40dkc4Xz146YXlAr1aRg3rYe1dCl6S%2FQwDRw2ZbMQKgSXGww1F5WE8ruXk4%2FKdZDq%2FaKyJPifkbHANfj4wLgRt7qlV0THsgvNR3EtetHsTufziiFsAyokphHcIYiZGdGfyFsDjizN4y0690VnhubWCLWD9MlIf7RMMbgqjvZ4ryOSawgJc1le%2F6su77soClDSjC8eRf0Lan8sZU127gruPUsftTS33UJMGRhQbJRMOzRwswGOqUBhaBY4wHvMS6jPVt1ZpgBnPrPjPeBeus7xSzG9zMqo6c0onoEAr6D4OHZjaIPaRQdBKVtrXJYYzwpIrK3YmOgJzZcgK2MTA7nL%2FoJbaOwfBdaQc%2FRd1x1MsCl7W3uARyunrf7VwzOtI5MXc6q9TuBdOVcYbuLP5jhKnXHe4Zh6ajaWlu3dNHiBURFYwCnRFMtFTeWeeXed7zA601%2F%2FZbkHFcJpptM&X-Amz-Signature=287ecf2a8bd060f6f6458fd41fc3dfc85608367b28deb5da20d71c45761cfff4&X-Amz-SignedHeaders=host&x-amz-checksum-mode=ENABLED&x-id=GetObject) Error handling and fault tolerance are often neglected aspects of development. How much does it cost to fix errors due to a poorly implemented error handling strategy or a complete lack thereof? How many API integrations are poorly put together, disregarding what can go wrong? How much data do we have to fix due to catastrophic events of cascading errors that could have been prevented with well-thought-out code? Let’s get down to the basics. This post is about building integrations with any system, third-party or not, over the network. In a previous post, we discussed the resumable error handling strategy and in what situations it can be helpful. Now, let’s discuss the transactional strategy. ## When to choose a transactional strategy? Let’s start with some recommendations to make the distinction between strategies clear. Choose a transactional error handling strategy when: - The workflow is composed of steps that need to be committed together - it’s all or nothing; - There is tight coupling between the steps; - You can’t bear temporary inconsistency; - The external service allows undoing or rolling back side effects; - The workflow has just a few steps and API requests (this is common but not a hard requirement). ## The example In our example, we’re logging a list of order line items to the Modern Treasury API, where we have a “ledger account” for a “buyer”. Logging a line item creates a ledger transaction object from the buyer’s ledger account to the vendor’s ledger account. Important “Ledger transaction” is a local object within Modern Treasury that represents a transaction between two ledger accounts. It has no relation whatsoever with the transactional approach described in this article. Let’s imagine the following code: ```plain text def log_order total_amount = order_line_items.sum(&:amount) if has_enough_funds?(total_amount) # Issues a synchronous HTTP request order_line_items.each do |line_item| result = log_line_item(line_item) # Issues a synchronous HTTP request line_item.update!( # Local database update external_transaction_id: result.transaction_id ) end :ok else :not_enough_funds end end ``` This code is a rough draft of what we need to do, with no regard for an error-handling strategy. It’s not unusual to have multiple types of API requests in a transactional workflow, but for simplicity’s sake we’re using a single API request inside a loop. Note Due to obvious limitations, the code snippets will get more and more dense as we apply fixes and add new features. We will discuss means of abstraction in a future article, so bear with me! ## Transactional or resumable? When designing client API code, the first question to ask is “should it be transactional or resumable”? That can be determined by looking at the concept being modeled within. Can we partially log an order with one or more line items? The answer is no. We can’t afford temporary inconsistency because the customer shouldn’t look at their order and not see all their line items, which is true at any arbitrary point in time. It’s all or nothing. In resumable workflows, however, temporary inconsistency is bearable, and eventual consistency is reached through multiple retries in the worst-case scenario. ## ACID database transaction For our code to be transactional, we must submit our database commands to an ACID transaction, as it sends UPDATE statements to the underlying database connection. We want to guarantee all database commands are rolled back if something goes wrong. ```plain text def log_order # If the Ruby code raises an exception, the database # issues a `ROLLBACK` statement ApplicationRecord.transaction do # Code goes here end end ``` Our example has a single `UPDATE` statement but there could be several ones. Even with a single update, a transaction is still useful. Database transactions are not a solution to all data consistency problems, but they are the proper solution to our “all or nothing” use case. Are we done yet? No! ACID transactions are only concerned about local database commands. We must still roll back the external state from HTTP requests. ## External transactions The Sagas pattern instructs on how to roll back external transactions. Most Google results for “sagas pattern” will mention event-based microservices communicating through message brokers, which is not the case here. We’re referring to any code that interacts with external APIs. The core concept, however, still applies: if the transaction orchestrator (our code) detects an error condition, compensating HTTP requests must be emitted to undo the changes made by the preceding HTTP requests. Let’s apply this improvement to our code: ```plain text def log_order ApplicationRecord.transaction do total_amount = order_line_items.sum(&:amount) if has_enough_funds?(total_amount) begin order_line_items.each do |line_item| result = log_line_item(line_item) # Issues a synchronous HTTP request line_item.update!(external_transaction_id: result.transaction_id) end rescue => e order_line_items.each do |line_item| if line_item.external_transaction_id.present? rollback_line_item(line_item.external_transaction_id) end end raise e end :ok else :not_enough_funds end end end ``` We’ve introduced a method call, rollback_line_item, to roll back the logged line items so far when encountering an error condition. For simplicity’s sake, the loop that logs the line items is assertive, and there’s no specific error condition to check other than rescuing exceptions. That’s a significant first step, but we must be mindful of API semantics, leading us to our next topic. ### Designing external rollbacks What should the implementation of rollback_line_item look like? That depends on our API features, which should be carefully assessed. In Modern Treasury, we can’t delete a ledger transaction, but we can archive it, which seems like an excellent way to revert our operation and make its implementation more robust. To roll back a ledger transaction, we must ensure it’s created in a pending state because posted ones are immutable and can’t be rolled back. Also, we need to add a commit step that will move pending ledger transactions to posted. Let’s change our orchestrator code, renaming log_line_item to log_pending_line_item and adding a commit step: ```plain text def log_order ApplicationRecord.transaction do unless all_logged?(order_line_items) if has_enough_funds?(total_amount) begin order_line_items.each do |line_item| result = log_pending_line_item(line_item) # Issues a synchronous HTTP request line_item.update!(external_transaction_id: result.transaction_id) end rescue => e # ... end else return :not_enough_funds end end end # Commit step here order_line_items.each do |line_item| commit_line_item(line_item) # Issues a synchronous HTTP request end :ok end ``` The commit step should run after all ledger transactions are logged, apart from the ACID transaction. We can’t commit ledger transactions as they are logged because earlier ones wouldn’t be allowed to roll back if the current one results in an error. Also, we added an unless all_logged?(order_line_items) check for idempotency’s sake to avoid double logging. If you can replace a bunch of API requests with a single bulk request, by all means do it! A single request is generally safer and more atomic than multiple requests, and it also simplifies client code. At the time of writing, Modern Treasury (our example API) does not have a ledger transaction bulk API. Requirements will vary from API to API, so the main takeaway is to look up your API docs and carefully plan your implementation with error handling and fault tolerance in mind. ## Handling concurrency There’s a critical path in our code subject to race conditions. Note the following if condition: ```plain text if has_enough_funds?(total_amount) # Issues a synchronous HTTP request # ... end ``` Let’s assume our buyer has a $10 balance. What if a web request that spends the full $10 is issued twice, and both make it to the if condition simultaneously? Yes, both would resolve to truthy and run the same code. The user would be spending what they don’t have – $20 instead of $10 – which would result in a negative balance of -$10. The first question to ask is “does my API have concurrency handling features?”. In the case of Modern Treasury, the answer is yes. When logging a ledger transaction, we can submit balance check parameters to lock on what the current balance should be after the operation. With that, the API simulates the operation and returns an error code if the after-balance is different than provided; otherwise, it goes ahead and performs the operation. We can send the following parameters along with our JSON payload: ```plain text { "available_balance": { "eq": WHAT_THE_BALANCE_SHOULD_BE_AFTER_THE_OPERATION }} ``` This feature renders our has_enough_funds? check useless because now the balance would be checked implicitly when logging each ledger transaction. If we raise an exception when the balance check fails, our code already knows how to roll back. Therefore, our code can be simplified, and we can also detect the specific exception to return the appropriate error condition: ```plain text rescue InsufficientFundsError return :not_enough_funds end ``` If the API doesn’t provide concurrency features, a possible solution is to use row-level locking to throttle concurrency: ```plain text order_line_items.sort.first.with_lock do # All code goes here end ``` order_line_items.sort.first.with_lock would replace ApplicationRecord.transaction, as it has the same functionality but with row-level locks on top. ## Making commit and rollback fault-tolerant The API portion of our code now has commit and rollback steps, but they are unreliable. Be mindful that any code can fail, especially when making network calls. If either commit or rollback fails, our data would be inconsistent, and rerunning the code wouldn’t correct it. Designing commits and rollbacks as units that can be independently retried solves our problem, so let’s offload both steps to background jobs. ```plain text def log_order failed_ledger_transaction_ids = [] result = :ok ApplicationRecord.transaction do unless all_logged?(order_line_items) # ... begin order_line_items.each do |line_item| result = log_pending_line_item(line_item) # Issues a synchronous HTTP request line_item.update!(external_transaction_id: result.transaction_id) end rescue => e order_line_items.each do |line_item| if line_item.external_transaction_id.present? failed_ledger_transaction_ids << line_item.external_transaction_id end end result = e.is_a?(NotEnoughFundsError) ? :not_enough_funds : :error end end end if failed_ledger_transaction_ids.any? # Rollback step here failed_ledger_transaction_ids.each do |external_ledger_transaction_id| # Issues an asynchronous HTTP request rollback_line_item_async(external_ledger_transaction_id) end else # Commit step here order_line_items.each do |line_item| # Issues an asynchronous HTTP request commit_line_item_async(line_item) end end result == :error ? raise(e) : result end ``` That makes our code more reliable, given that most failures on both steps would likely be transient errors that would succeed in a few retries. We’re, of course, assuming the background solution to have retries baked in. ## Takeaways Working with external APIs takes a lot of work. The more critical our workflow is, the more important it is to have a solid error handling/fault tolerance strategy. - Pay close attention to the concept being modeled to decide whether to go with a transactional or resumable error handling strategy; - The transactional strategy requires API calls to be synchronous because we need to decide whether to commit or rollback everything; - How we design and implement external rollbacks will always depend on particular API features and semantics; - Always have at least a rollback step for external API interactions. The commit step may sometimes be implicit and will depend on API semantics; - External API Commit and rollback steps can generally be offloaded to background jobs for increased fault tolerance; - Idempotency is important for critical workflows; - Properly handling and limiting concurrency is also important; - See if your API provides features to handle concurrency; otherwise, try local solutions such as row-level locks or advisory locks.

▼

Scraped Content

— 1837 words · 2026-02-14 17:39:45 UTC ·

Excerpt

Visibility

Visible to everyone

Reading Status

Related Bookmarks

My Note

Saved!

Annotations

Agent findings

info URL returned 403 (likely bot-blocked, not necessarily broken) health · Jul 20

info Long content (1837 words) has no proposition chunks health · Jun 29

Export as Markdown