Our primary banking web application reached a point of unsustainability in that it had intractable architectural problems with no realistic path to improvement.
Client-side performance could not scale. The application's client-side performance was sufficient for users in their first couple of years of active use. As their account's data grew (up into the multiple megabyte range) performance degraded. The application was built with a critical architectural flaw: it assumed the entirety of a user's account data would be available within the client. Many features relied on this assumption (transaction search, reporting and graphs, transaction sorting, linking to a transaction from support messages, etc), and to break this assumption would require fundamentally re-thinking these features.
Bespoke in-house framework. The client-side application was built on a bespoke in-house framework similar to Backbone. Unlike Backbone it was undocumented, unmaintained, and had many fundamental problems.
- "Model" components made no distinction between "collections" and "objects", and had no schema. This made them confusing to work with as half of the API was irrelevant based on the data structure of its content.
- The HTTP transport component was chatty by default, so would fire off unexpected requests constantly. It used attributes of the associated "model" component to infer how to make requests, making it difficult to work with HTTP services which didn't operate under those same assumptions (which the case more often than not).
- "View" components were tightly coupled to "Model" components (a
FooViewwould render a
FooModel), making it difficult to build reusable UIs. For example, a transaction list View might contain multiple kinds of transaction-like Models, but would be hardwired to a single Model type. This kind of coupling required complex overrides of core framework logic to work around.
On top of all of these frustrating architectural problems, the lack of documentation made for a steep learning curve. New engineers would spend weeks running into confusing issues before feeling anywhere near confident in working on features.
Obsolete technologies. The framework was built on top of MooTools, which heavily modifies the native runtime environment by extending native classes and exposing a large number of globals. Using MooTools as the core made it difficult to use most other open source client-side technologies (which I'll talk about later). Aside from the technical issues, MooTools made hiring difficult. Nobody building state of the art web applications wants to go anywhere near MooTools.
MooTools used its own package manager called Packager, which was built with PHP. Packager was awful to work with so we wrote our own drop-in replacement in Ruby called Plums. Only one person on the team really understood how Plums worked, and nobody knew how Packager worked. Whenever a module packaging error occurred people would struggle for hours to resolve the issue.
I can't describe how bleak it is trying to build a state of the art product on these tools when open source alternatives are blossoming around every corner. The problems our in-house tools were solving had been solved over and over, and in better ways, than our own.
Poor separation of domains. The middle-tier of the banking application was a Ruby application. The Ruby application would then do its own work and eventually make requests to the backend services.
The fundamental problem was that the Ruby application should have been a lean proxy to the services, but was instead doing far too much work. Engineer would commonly re-write client requests at this layer before sending them to services, and also rewrite service responses before sending them back to the client. This was sometimes done to accommodate the inflexibility of the client-side framework, other times to get something done sooner rather than waiting on the backend team to have time to build it out in the services layer.
A side effect of this middle tier is that important details would often get dropped between the client-side and the services. For instance, HTTP headers set in the client were almost never sent to the services, which made many requests less efficient because things like
if-modified-since couldn't be taken into account. A common issue is that the middle-tier endpoint for a specific request wouldn't have taken every HTTP response code from the service into account, so basic things like 429 Too Many Requests (aka rate-limiting) responses would appear in the client application as 500 Server Error, or get swallowed altogether and appear as 200 with an empty body.
The organizational impact of this middle-tier was largely unacknowledged but immense. We essentially had two APIs: one official API with documentation and processes for proposal, critique, and acceptance, the other completely ad hoc and undocumented. Instead of bringing API needs to the backend team, this team was self-servicing half-baked workarounds.
Humans and Feelings. Everything about the application was bespoke – many people in the organization had put months and even years into inventing it from scratch. Navigating how to speak to all of the above issues and proposing to replace it while honoring their work was a daunting task.
Explaining the Problem to My Team. I didn't go after every problem I saw, but focused on the things which were recurring pain points we all dealt with. We regularly had to write complex workarounds for using the same UI component in multiple places in the application. The process could take hours and often resulted in a long-tail of odd UI and state bugs. I built a very small proof of concept application which showed how React could render any component in any nested structure any number of times, and state changes would propagate to every instance based on the central state tree. That one little example made most of the team understand how much better things could be.
Explaining the Problem to the Company. Once the team was on board I put together a talk, "The Future of Our Web Applications", which was presented at our quarterly engineering conference. I spelled out in detail how much time and effort we were putting into dealing with bespoke technologies instead of shipping user-facing features. I outlined each of the things which were failing us, what we would replace them with, and how that would lead to more Product focus.
Incremental change and then rebuilding in parallel. Our goal was to incrementally rebuild the application in place so that we could continuously ship to users without blocking Product work.
We sliced all of the standalone screens and rebuilt them, so navigating between pages would be navigating between client-side applications. The vast majoring of the banking application, unfortunately, existed in the transaction sidebar (bill pay, ACH transfer, instant payments, support chats, etc). In order to incrementally rebuild this features we'd have to ship one sidebar feature at a time in the same client-side environment as the MooTools application. We ran into a wall of incompatibility. Some of the native browser objects React depended were modified by the MooTools runtime, so React would throw errors while rendering. Our in-house MooTools router fought with React-Router for control in unexpected ways. Making progress became impossible.
After getting stalled for about two months we made the decision to rebuild all of the sidebar features in one go and ship them together; our users would stop getting updates for a while, but they'd get the fully-functioning rebuild much sooner. This would also allow the entire team to return to working on new features sooner, given the slow pace of the incremental rebuild work.
Move functionality to the services. Many of the application's fundamental problems were solved by replacing our tools with React (+ Redux), etc., and using npm instead of Plums. The issue of a poor separation of domain was a different story. Nobody on the team wanted to continue making the same mistakes within our middle-tier Ruby application. As we worked through the rebuilt UI we collaborated closes with Data and Backend engineers to move functionality into the services. The Search feature got a dedicated backend service (so now mobile clients get the same search results as the web app), new transaction paging endpoints were created, all new middle-tier endpoints were lean pass-through proxies straight to the services, and some features got dropped because the cost to replicate wasn't worth the tradeoff.
The new React + Redux application has proven to be a huge boon to productivity and overall happiness. We ship features and changes faster and with fewer bugs. Because we're using standard tools we're able to take advantage of the wider ecosystem, which has led to the adoption of ESLint, Webpack, charting libraries, validation libraries, and so on. These tools have helped us to ship higher quality code while focusing solely on user-facing features in support of the larger product vision.