Saturday, 28 December 2019

Extending Legacy Software

I'm a big fan of productivity software, organising and prioritising tasks and trying to avoid being one of those people who commit to tasks and then have permanently forgotten about it a week later! Over the last couple of years I've tried out a number of different productivity tools: Trello, ToDoIst and even just Windows sticky notes to keep track of my tasks. However while these tools have their merits, I've not been able to stick to them having found minor annoyances or lack of intuitive UI leading me to keep adding tasks, get overwhelmed by them and ditching the tool for something else.

In my first full-time job after University I worked in a support team supporting a number of applications in various languages - C#, ASP, HTML, JS, Business Objects, SQL Server transformation services amongst others. I'd studied Java as my primary programming language throughout University and was determined to move into the Java development team, but needed to keep my Java skills fresh. To this end, I started a side-task of writing a Java productivity tool - a To Do List application.

The advantage of writing my own productivity tool being that I could tailor it to my preferences, although I like to think that it's pretty intuitive in its own right. I offered it out to other people and got some pretty good reviews!


It allows easy reordering of priorities, a log to add updates of activity against a task, an archive of completed tasks. Over time I added other facilities, "Alerts" that remind you about a task or change its priority at a certain time (eg if it's getting close to a deadline).

So I've gone back to using this tool recently, and have actually found it suits my needs and I've pretty sure it's the tool I'll stick with for the forseeable future.

However I found myself wanting another feature in the tool - scheduled To Dos. Essentially there are tasks I need to do once a week, month or whatever but there is nothing in the tool along these lines, so I've ended up creating a lot of recurring reminders in my outlook calendar, and then manually creating the To Dos when the reminders fire. However I also have a of meetings and other stuff in my calendar, which means that I get so many reminders I often have to mass delete them and it makes it hard to work out which reminders I still care about and which are just noise.

So ideally building these recurring tasks into the tool would mean I can ditch the Outlook reminders, and things will appear at the top of my list at the right time without me having to do anything.

However this tool is something I wrote as a fresh graduate developer 12 years ago! I've since studied and passed Java programmer, web developer, business component developer and enterprise architect certifications, spent thousands of hours reading and improving, and have an extra 12 years experience working with Java software! So my skills now are somewhat more developed than they were 12 years ago. Functionally, I don't remember ever finding a bug in the software while using it; poorly written software doesn't automatically mean it doesn't work, in fact some of the most successful products I've worked with have been written pretty badly.

However, poorly written software can be very difficult to extend over time. Some of the issues with this tool I found on initially reviewing code are:
  • Bespoke persistence mechanism
I'd wanted to avoid needing any extra software (other than a JVM) to run the To Do list - it's pretty much guaranteed that if you write a tool for other people to use, and it doesn't just work with no set up, hardly anyone will bother with it. So the persistence mechanism is just an ASCII file that the application creates in the same directory as the executable JAR file. This in itself is reasonable, however the implementation is a custom XML encoder and decoder, not using any known library (eg JAXB), and while the code isn't that hard to understand, it's not something you'd want to touch lightly, as there's a high risk of something breaking if you try to modify the way the encoding works, especially given any changes need to be backwards compatible with existing To Do data files.
  • Logic in GUI (lack of layered architecture)
A well known principle in software is MVC, and that the UI should be as dumb as possible. The To Do List app has a large "God" class "MainGui.java", which extends JFrame and is nearly 1000 lines of code, not just dealing with rendering the screen but also has methods related to alerts (which don't appear on the screen), file saving and loading and so on. In reality this class should be small, and just deal with rendering the screen and recognising user actions and delegating to the correct services.
  • Massive code duplication
I was a little bit surprised that even 12 years ago I wasn't better at recognising code duplication! There is copious amounts of repetitive code like:


panelHeader = new JPanel();
panelDesc = new JPanel();
panelPriority = new JPanel();
panelComments = new JPanel();
panelButtons = new JPanel();

panelHeader.setPreferredSize(new Dimension(470,50));
panelDesc.setPreferredSize(new Dimension(470,35));
panelPriority.setPreferredSize(new Dimension(470,35));
panelComments.setPreferredSize(new Dimension(470,300));
panelButtons.setPreferredSize(new Dimension(470,35));

lHeader = new JLabel("Create a New To Do");
lHeader.setFont(new Font("Comic Sans MS" , Font.BOLD, 24));

panelHeader.add(lHeader);
There's a general style of individually creating and configuring every sub component in panels, in panels that often follow almost exactly the same style. This code can be massively cut down with the use of utility methods like:
public static JPanel createLabelPanel(int width, int height, int fontSize, String text) {

    JPanel panel = createRedBorderedPanel(width, height);
    JLabel label = new JLabel(text);
    panel.setLayout(new GridBagLayout());
    label.setFont(new Font("Comic Sans MS", Font.PLAIN, fontSize));
    panel.add(label);

    return panel;
}
This creates a whole panel based on a few parameters and a few sensible assumptions (like which font to use). Generally any repetitive code as per the previous example should set off alarm bells that there's a better way of doing it!
  • Odd "domain" model
Similar to the issue with an overcomplicated GUI class, the domain model isn't well separated either. With a To Do application you'd expect domain obejcts like ToDo, Alert, Comment, but then there are odd extra classes like "ToDoHolder", which is a Decorator of sorts but has a strange mix of responsibilities to create GUI representations of a To Do while also adding extra business logic to To Dos (that as per DDD ought to live on the To Do itself). Putting GUI representations here assumes clients want a certain way to represent a To Do, but in reality we shouldn't make assumptions about the client - for example it would be nice at some point to have a web interface for the To Do application, which has no use for JPanels!
  • Dodgy code style - naming conventions, use of break with labels
There is plenty of dodgy beginner style coding, large laborious methods which are hard to follow, and even the use of labels and "break" within a while loop which I'm not sure there's ever a legitimate use for:


whileLoop:
while(i.hasNext())
{
   ToDoHolder holder = (ToDoHolder)i.next();
   
   if (e.getSource() == holder.getBPriority()) //change of priority for this todo   {
      new PriorityChange(holder, this);
      break whileLoop;
   }
//....
}

(Not sure it even makes any difference here!)

  • Dependency on IDE build process
The Original app was written in a specific IDE (I don't remember which one now) and reliant on that IDE to export a runnable JAR file. Of course these days everyone should be using a tool that can build independent of an IDE, such as Maven. Digging out a 12 year old project and trying to find the IDE that built it and assuming it is backwards compatible with itself 12 years ago isn't ideal!
  • No Unit Tests

Unit testing wasn't the all consuming religion it is now when the app was written, I don't remember even talking about it during my University course - but unit tests would be useful in this situation to give confidence in refactoring any existing areas.

That said - I am in favour of being selective of where unit tests are used. They take time to write, and especially in a project like this where I am the only developer, the time saved by avoiding bugs but having to write the tests in the first place vs time saved by not bothering and just fixing any bugs would have been a net loss. As Victor Rentea says, they should be written "where the fear is" - ie the areas where otherwise it would be too scary to touch the code, a few tests for the more complicated logic may be beneficial, but writing for ~100% code coverage is a poor use of time for a questionable goal.

  • Use of Outdated APIs
The Java Calendar class is no longer recommended, from Java 8 a new Date/Time API has been released that addresses a lot of the well-documented shortcomings of the Calendar and Date APIs. However old applications that predate more modern and better APIs often are stuck on older APIs that are no longer improved and have been wholesale replaced. This is the case with the To Do application which uses the Calendar APIs.

How to extend the functionality?

The key thing is to see if there are natural integration points. In this instance, we want a scheduling system to feed in new To Do items at the appropriate times. For the most part, the new code doesn't need to interact with the existing code - we can take inspiration from DDD and treat the new part of the application as a new bounded context and develop it in isolation, and only interact with the rest of the application at well defined and understood points.

This follows the idea of "ports and adaptors" between bounded contexts, the domain logic is shielded and the scheduling context is developed as if in isolation, and we simply find the best port and adaptor to link the two together.



















In terms of integration points, the change is fairly straightforward:

Point 1 is fairly straightforward, there is a section of code that builds up the main menu, so we can just stick another menu item and action listener there:


createScheduledTodo = new JMenuItem("Create Scheduled To Do...");
viewScheduledTodos = new JMenuItem("View Scheduled To Dos...");



optionsMenu.add(updateDescriptionsMenuItem);
optionsMenu.add(deleteMenuItem);
optionsMenu.add(filterSortMenuItem);
optionsMenu.add(createAlertsMenuItem);
optionsMenu.add(viewAlertsMenuItem);
optionsMenu.add(createScheduledTodo);
optionsMenu.add(viewScheduledTodos);

createScheduledTodo.addActionListener(this);
viewScheduledTodos.addActionListener(this);


Point 2 is a little trickier, but actually not too hard, given that we can take inspiration for other code in the application that does the same things. New To Dos are created manually from a separate screen in the application:




So whatever happens when "Create this To Do" is clicked, is the same processing the scheduler needs to do when a new To Do is required:

ToDoItem item = new ToDoItem(priority, desc);
ToDoHolder holder = new ToDoHolder(item);

//create audit infoString audit;
audit = "To Do Created. " +
      "\nDescription : " + desc +
      "\nPriority : " + ToDoItem.priorityAsString(priority) +
      "\nComments : " + taComments.getText().trim();

holder.addToLog(ToDoHolder.LOG_AUDIT, audit);

main.addToDo(holder);
Whether this is good or bad, it's relatively simple and something that we can replicate in the new scheduling functionality and have confidence that it works. We can also implement a couple of Utility methods to tidy it up a bit and get rid of (or at least hide) the "ToDoHolder" weirdness:

private void processScheduledToDo(ScheduledTodo scheduledTodo) {

    if(scheduledTodo.getNextFireDate() == null) {
        return; //shouldn't happen but ignore.    }

    if(scheduledTodo.getNextFireDate().before(Calendar.getInstance())) {
        ToDoItem toDoItem = new ToDoItem(scheduledTodo.getToDoPriority().getIndex(), scheduledTodo.getTitle());
        toDoItem.addToLog(ToDoItem.LOG_AUDIT, "To Do item generated from schedule");
        toDoItem.addToLog(ToDoItem.LOG_AUDIT, toDoItem.getDescription());
        ToDoUtilities.createNewToDoItem(toDoItem);

        updatePersistenceModelAfterFire(scheduledTodo);
        GuiUtils.showInformation("Scheduled ToDo Created", "Created new Scheduled ToDo: " + scheduledTodo.getTitle());
        return;
    }
}

 
So in this instance extending the software was quite easy. Obviously there are a few caveats with this approach:

  • If you need to change the functionality of an existing software feature you may not be able to isolate your new code from the legacy code.
  • If you are working on a legacy project for an extended length of time, there may be more value in refactoring the existing code.

To Do Application is available on GitHub: Link

Thursday, 20 July 2017

Does Event-Based Data Synchronisation with Microservices Work?

Microservices in conjunction with DDD subscribe to the idea that each microservice should hold its own data store and be the "source of truth" for that information. Problems arise however when other bounded contexts need a copy of that information - how does bounded context Y obtain a copy of bounded context X's data?

One strategy for doing this is event-based data synchronisation. X broadcasts a message to a queue (or topic) for Y to pick up changes to the data structure.

The onus therefore is on Y to interpret this information and store a copy of it in its own database to make use of later on. This seems to have several serious flaws:

1. Complexity of piecing data back together

In simple cases, like taking a copy of country codes, piecing the data back together may not be difficult. On the other hand, if Y needs complex information from X to perform it's function or security policies, things can escalate very quickly. Data like effective dates, join relationships between entities and hierarchies can be very complex and piecing this information back together perfectly is no easy task.

2. Out of order events

Given that scalability is one of the key aims of microservices, it is very likely that multiple message receivers will be called into action to deal with messages put onto a queue. This therefore introduces a race condition and possibilities of messages arriving out of order ie:

* Message 126: Vehicle 123 assigned to Order xyz
* Message 120: Vehicle 123 Created

Here a message for the creation of vehicle 123 is not received until after the message to take some action with that vehicle - ie processing is attempted before it even exists. There are various strategies to deal with this, although none trivial or neat. One potential idea is to simply allow message 126 processing to fail and return to the queue until message 120 has completed, though this is dubious and depends on the specifics of the queue infrastructure.

3. Publication of Bad Messages

Event-Based synchronisation assumes that all messages make sense and will be interpreted successfully by the receiver. However in reality it is very possible that a message with bad information may go out due to a coding bug, or perhaps badly validated user input. In this case how should we handle the bad message? If it goes to a dead letter queue what do we do? Republish it? But other events may have superseded it in the meantime meaning republishing it would corrupt the data further. Alternatively messages could be sequenced (although this may not be reliably possible) and the onus put on the receiving microservice to sort out the mess, which is also non-trivial.

4. Omitted Messages

Given the "I'm a microservice and don't care about anyone else" philosophy, it would be very easy for X to make a change to its domain and forget that other microservices are interested in the domain or resulting data change and make no attempt to publish any necessary events in the new code paths that have been introduced. If this code makes it into production, the result is that Y and possibly dozens or hundreds of other microservices are left with bad data.

How can hundreds of microservices with bad copies of data from X be reset with the correct information? Throwing into the mix that these copies of data from X may be structured slightly differently to X, and slightly differently to each other, meaning each microservice needs a bespoke individual script or strategy to fix their data. Multiple this as well by data sourced from other bounded contexts A, B and C, and the complexity of fixing data in production seems enormous.


Possible Solutions

Context Enrichment

One variation of this pattern suggests that the consuming bounded context receives minimal information from the publisher in a message and calls back into the publisher for any other information necessary. Does this alleviate the problems above?

Problem (1) - possibly. If the subscriber receives messages knowing that record R1 has changed, it can call back to the publisher for a new copy of record R1. This is simpler than if the publisher has several different events relating to different fields of record R1, which would all need separate handlers and update methods. With context enrichment the subscriber can have a single update mechanism, rather than a collection of granular event handlers.

Problem (2) - possibly. If a message related to a vehicle that has not yet been created is received, the receiver could call back and ask for the record related to that vehicle before it attempts any further processing. This case would be ok, however a more subtle out-of-order case may not be so easy for the receiver to know how to compensate for.

Problem (3) - Limited help. If the subscriber assumes the data received is correct it could still make mistaken assumptions (eg and update the wrong record). However messages with less information in them have the advantage of having less scope to be incorrect.

Problem (4) - No. If a message is not received, the subscriber has no realistic opportunity to do anything about it.

Global Read-Only Data Store

This Article describes the idea of application database and integration databases as separate concepts. Application databases are one per bounded context, with each bounded context owning that data, almost as "session" data. The data is replicated similar to CQRS style to the integration database which all contexts can access in a read-only fashion.

This removes the majority of complexity around message synchronisation, message versioning, failing messages and bounded context data that goes out of sync.

On the other hand it introduces database schema coupling, meaning that changes to the integration schema must be carefully managed. For example to change a table structure, it may be necessary to run new and old versions of the table in parallel for a while to allow systems using the old structure to migrate, before the old structure is removed.