Monthly Archives: March 2016

Functional DAOs: Amazingly easy way to make Java faster and less hungry for RAM

Some weeks ago I did a code review and I saw something like this in the source:

class UserExporter {
    private final User2CsvConverter user2CsvConverter = new User2CsvConverter();

    private final UserDao userDao;

    public UserExporter(final UserDao userDao) {
        this.userDao = userDao;
    }

    public List<String> getCsvLines() {
        final List<User> userList = userDao.loadAllUsers();
        final List<String> lines = new ArrayList<>();
        for (User userItem : userList) {
            lines.add(user2CsvConverter.convert(userItem));
        }

        return lines;
    }
}

So there’s some data storage with “Users”. The userDao (DAO = Data Access Object) is responsible for getting the data and then returning it as a list of Users.
The next step loops thru all those User items in the list we just got from the DAO and executes some magic on it, i.e. in this case it converts each User object into a String, obviously in order to export all those this line into a CSV file.
Each CSV line will, again, be added to a List of Strings.
Later on the actual CSV export happens:

    int process() {
        final List<String> csvLines = userExporter.getCsvLines();

        int linesWritten = 0;
        csvWriter.writeLine("First Name;Last Name;Email");
        for (String line : csvLines) {
            csvWriter.writeLine(line);
            linesWritten++;
        }

        return linesWritten;
    }

So the List that the UserExporter.getCsvLines() method returned will, again, be looped in order to write each line to some external output device, e.g. a file or System.out.
—-
The code is straightforward, it has a clean layering (DAO abstracting the storage, CsvWrite abstracting the actual output device) and it even has 100% test coverage.
So what the heck could I possibly be complaining about, right?

Well, first of all not only have I seen a lot of similar code like this but I actually used to code like this myself.
Also, this is obviously a simplification of the real code (which I cannot show you, of course).

There are 2 important things to know about the real code:
1) The real data storage is pretty big so in some cases the DAO may return a lot of items.
2) The real code does much more magic on the lists.
As a result I know that we shot the server a bunch of times when we started big exports. It struggled for hours and then failed with OutOfMemory Errors. (And as web client users got impatient they repeated the export some more times only to find out that it – surprisingly – didn’t make the server work faster.)

Someone suggested the usual “we need more RAM” thing but there’s an easier and better solution at hand.
The biggest problem are all those ArrayLists that are being created all over the place.
The DAO creates one, passes it to the CsvConverter which “only” loops it but creates its own ArrayList and so on.
Everytime you use up a lot of memory for those Arrays and a lot of CPU for looping.

For the complete original, imperatively coded example, cf. https://github.com/acnebs-com/clean-code/tree/master/functional-dao/src/main/java/com/acnebs/posts/functionaldao/imperative

So how can we fix it?
Let’s get rid of the lists!
How do we do it?
We use functional style programming.

Here’s how you can use Java 8 to get an equivalent solution with NO (in numbers: 0) Lists at all.
Instead of a returning a List, let’s make the DAO take a Java 8 Consumer that will be called back for each “User” item:

interface UserDao {
    void loadAllUsers(Consumer<User> userConsumer);
}

Now the Exporter can do this:

    public void getCsvLines(final Consumer<String> lineConsumer) {
        userDao.loadAllUsers(
                (user) -> {
                    final String line = user2CsvConverter.convert(user);
                    forEachLine.accept(line);
                }
        );
    }

It calls the DAO and for each User item the dao “returns” (line 3), i.e. has to “offer”, the converter can do it’s magic (line 4). No List, no loop needed.
The CSV line that is being created for each User item doesn’t get added to a list either. Instead it will be “returned” or “offered”, well, handed over to a Consumer that is called back for each CSV line (line 5).

This String Consumer is passed to this method by the outer layer of this application:

    void process() {
        csvWriter.writeLine("First Name;Last Name;Email");
        userExporter.getCsvLines(csvWriter::writeLine);
    }

Again, no Lists, no loops.

When you use this approach you kind of execute the “front end”/output code when you access your data while still keeping a clean architecture.

Or, as another way to look at it: In imperative programming, when your method returns a value, you force the method’s clients to “live” with this value no matter what. If that return value happens to be a big Collection then there’s no other way for the client but to use this Collection with high memory consumption and all.
But if your method takes a Consumer instead then the clients can choose how to handle the data. If all they need is just a list after all then they can pass a plain and simple consumer that will do the trick. Actually that’s what I do in my UnitTests (line 4+5):

    @Test
    public void testLoadAllUsers_ok() throws Exception {
        UserDaoJsonImpl dao = new UserDaoJsonImpl("/users-integration-test.json");
        List<User> data = new ArrayList<>();
        dao.loadAllUsers(data::add);
        assertEquals(3, data.size());
    }

But if they want some special handling, like in this example application, then the clients will have the freedom to choose a more clever way.

This gets even more exciting when you think of how to handle special cases:

    public List<User> loadAllUsers() {
        try {
            final URL resource = this.getClass().getResource(path);
            if (resource == null) {
                throw new FileNotFoundException(path);
            } else {
                final InputStream is = resource.openStream();
                final ObjectMapper mapper = new ObjectMapper();
                return mapper.readValue(is, new TypeReference<List<User>>() {});
            }
        } catch (IOException e) {
            throw new RuntimeException("loadAllUsers: IOException e", e);
        }
    }

In this solution, when an IOException occurs, the loadAllUsers method throws a RuntimeException hence forcing each client code to handle it.
Alternatively the method could have caught the IOException and logged a messages to a logging framework.
Or it could have swallowed the IOException exception hence leaving no clue for client code about what happened.
Whatever the solution we choose – the client code will just have to accept it.

Now think of giving the client code freedom to choose how to handle special cases:

    @Override
    public void loadAllUsers(final Consumer<User> userConsumer) {
        loadAllUsers(
                userConsumer,
                e -> {throw new RuntimeException("loadAllUsers: Exception e: " + e.getMessage(), e);}
        );
    }

    public void loadAllUsers(final Consumer<User> userConsumer, final Consumer<Exception> errorConsumer) {
        try {
            final URL resource = this.getClass().getResource(path);
            if (resource == null) {
                throw new FileNotFoundException(path);
            } else {
                final InputStream is = resource.openStream();
                final ObjectMapper mapper = new ObjectMapper();
                final TypeReference<User> valueTypeRef = new TypeReference<User>() {};
                final JsonNode jn = mapper.readValue(is, JsonNode.class);
                final Spliterator<JsonNode> jsonNodeSpliterator = jn.spliterator();

                StreamSupport.stream(jsonNodeSpliterator, true)
                        .forEach(jsonNode -> {
                            try {
                                final JsonParser jsonParser = jsonNode.traverse(mapper);
                                final User user = jsonParser.readValueAs(valueTypeRef);
                                userConsumer.accept(user);
                            } catch (IOException e) {
                                errorConsumer.accept(e);
                            }
                        });
            }
        } catch (IOException e) {
            onError.accept(e);
        }
    }

In line 28 an IOException is caught. The client of this method can now choose what to do in such an event, cf. line 5:
It can choose to throw re-throw the Exception, aborting the whole export if only 1 User item is broken, as it does in the example above.
Or it could choose to ignore broken items, just log them and continue:

    @Override
    public void loadAllUsers(final Consumer<User> userConsumer) {
        loadAllUsers(
                userConsumer,
                e -> {LOGGER.error("An oopsie occurred: " + e.getMessage(), e);}
        );
    }

Or, how about a progress indicator that should be triggered on each item in order to output some info?
Just add some code for that to your Consumer.

    public void getCsvLines(final Consumer<String> lineConsumer, final Consumer<User> progressConsumer) {
        userDao.loadAllUsers(
                (user) -> {
                    progressConsumer.accept(user);
                    final String line = user2CsvConverter.convert(user);
                    lineConsumer.accept(line);
                }
        );
    }

For the full code please cf. https://github.com/acnebs-com/clean-code/tree/master/functional-dao/src/main/java/com/acnebs/posts/functionaldao/functional8

OK, I can hear you: “We’d like to use Java 8 but they don’t let us”.
Fair enough, but actually I have been lying to you – there’s hardly any “real” Java 8 in the solution I’m proposing.
You can use the same approach and architecture in Java as early as probably 1.0.
Sure, there’s no Consumer interface in Java 7 but so what: just create it yourself.
Actually I should have done that in my Java 8 code, too, because now my public methods have a dependency to java.util.function.
This is generally not the cleanest way to do it, just as you should not use Maps, Lists or Sets in your public method signatures.

interface UserConsumer {
    void doOnUser(User user);
}

//...

interface LineConsumer {
    void doOnLine(String line);
}

//...

class UserExporter {
    private final User2CsvConverter user2CsvConverter = new User2CsvConverter();

    private final UserDao userDao;

    public UserExporter(final UserDao userDao) {
        this.userDao = userDao;
    }

    public int getCsvLines(final LineConsumer lineConsumer) {
        final int out[] = new int[]{0};

        userDao.loadAllUsers(
                new UserConsumer() {
                    @Override
                    public void doOnUser(final User user) {
                        final String line = user2CsvConverter.convert(user);
                        lineConsumer.doOnLine(line);
                        out[0] = out[0] + 1;
                    }
                }
        );

        return out[0];
    }
}

So you can use the functional way in Java 7, too. You just cannot use lambdas.
And, the code looks somewhat awkward, right?
(For more details cf. the complete Java 7 solution)

That raises an interesting question:
If high memory consumption and expensive loops in Java have been bothering us for years while there has been a way to solve it using Callback interfaces – why aren’t we using them in Java 7 all the time?
Is it because developers are too lazy to write those extra lines of code? Hmm.

Maybe it’s just because it feels wrong.
The Java 7 solution just looks unnatural, unfamiliar, it feels like you’re coding wrongly.
So is using your intuition the problem?

Of course you should listen to your intuition.
When you start coding you need to realize what you don’t know in order to find out what to learn.
You will travel from this “unconscious incompetence” to “conscious incompetence”, then “conscious competence” until you eventually know all those things while you’re not even aware of them (“unconscious competence”). (See Four stages of competence for more about this)

It’s this unconscious competence that’s yelling at you from the inside when you’re about to do something awkward, weird, and most of the time this intuition is right: you’re about to do something wrong, so stop it.
But sometimes it’s just another unconscious incompetence.
I learned imperative style programming some centuries ago so it feels natural and familiar. Functional is new so of course it feels strange at first.

And, programming is using a language to utter ideas. So just like human languages, programming languages influence the way you think. If your programming language has no “natural” support for functional programming it’s no wonder you don’t think of it all the time.

So I think we should go easy on ourselves. It’s OK to do things like imperative programming for a while and then improve your style later.
If it wasn’t OK how would we ever start? There will always be a next big thing we cannot afford to wait for.

In fact, I try utilizing this effect: I try to make the good things easy, familiar and natural to use while making the bad things hard.
For example, decoupling by using interfaces is a good thing.
I encourage my developers to create dependencies only toward interfaces. That’s why I want interfaces to be named in a pretty way, like “Exporter”, so it “feels right” to use them. If they were called “IExporter” or “ExporterInterface” then developers might avoid using them.
Implementations, however, should be called in a way that makes developers know immediately that they should not bind their code directly to something like an “ExporterJava8FooBarUsingJsonCsvImpl”.

What do you think?
Please leave a comment and let me know.

If you like this post then please share it.

Happy hacking!