Lessons Learned: Firebase Database

Categories: Technology |

Introduction

If you are here, you probably already know about Firebase. But in case you don’t, here is a small description: Firebase is a backend-as-a-service platform, probably the most popular one in 2017. It provides several services such as analytics, authentication, near-realtime database, notifications, and cloud functions. Furthermore, it supports several platforms and languages such as Android, iOS, Javascript, Unity, and others. Nevertheless, the focus of this article is the near-realtime database (Firebase Database), its benefits, the problems we had, and what we learnt.

You have probably already come to the problem of modeling your database in Firebase, and let us guess, it was not as simple as you thought it will be. In our case, we spent a lot of time discussing how it should be structured, thinking about the relations and entities. The problem here is that for us, and probably for you, you come from a relational world, where everything is about the schema and data normalization. While in Firebase, you have to structure your data after your view, and denormalization is perfectly fine.

For a long time the only option for serious data storage has been relational databases, so we are very used to think in a relational way. However, I’m sure that lately you have heard the term NoSQL, which are alternative databases (mostly open-source) that have been gaining attention thanks to their way of handling data to solve different problems (e.g. internet search, large-scale web applications, social networks). And, as you can imagine, Firebase Database is a NoSQL database based in JSON documents (JSON tree).

NoSQL databases are worth using for situations that require a different data interaction, or demand the use of a cluster. Moreover, they behave differently from relational databases. For example, ACID properties usually clash with their environment (i.e. clusters), and they operate without a schema, so we need to understand the nature of the data and how we want to manipulate it before modeling. But let me stop here, interested readers can find more information related to the NoSQL world in the reference at the end of this post.

Now let’s review some good practices for the Firebase Database.

How to structure your data

Denormalized data is something very common in Firebase, and it is also something that we were not very accustomed to. Based on our experience with relational databases, we learned that we should avoid duplicated data, but in the case of Firebase it is something that allows us to save a lot of time by reducing queries. For example, take the first model that we came up with for an events application in listing 1, where we tried to model the database with relational thinking.

Listing 1: Our first database model

{
    "eventInformation": {
        "bannerImageUrl" : "http://images..../MC-2018.jpg",
        "title" : "Mobile Congress 2018",
        "date" : 1501502400
    },  
    "users": {
        "TPjbF78fKmdvRIC3Awzo03ygm692" : {
            "displayName" : "Juan Valencia",
            "email" : "jvalencia@belatrixsf.com",
            "phoneNumber" : "956068715",
            "photoUrl" : "https://.../s96-c/photo.jpg",
        },
        ...
    }
    "sessions": {
        "77Sg79oDbJSfVTC8kPWpjywi7263" : {
            "capacity" : 100,
            "name" : "Introduction to Firebase",
            "registrations" : {
                "TPjbF78fKmdvRIC3Awzo03ygm692" : true,
                "2xJxDomSYAMZO82nMu2egf0Vh0Z2": true
            }
        }
    }
}

There are a couple of problems with this approach. First, everytime we want to retrieve the information of a session, we will also get all the registrations (imagine 100+ registrations) because they are nested (Deep Data is an anti-pattern in Firebase). Second, we put all the users keys (the keys can be manually created, but they are usually generated by the database, e.g. 77Sg79oDbJSfVTC8kPWpjywi7263) in the registrations array to avoid duplicated data, which will force us to do more queries to get the information of each registered user (i.e. joins).

Listing 2: Shallow structure

{
    "registrations" : {
        "77Sg79oDbJSfVTC8kPWpjywi7263" : { 
            "TPjbF78fKmdvRIC3Awzo03ygm692" : {
                "displayName" : "Juan Valencia",
                "email" : "jvalencia@belatrixsf.com",
                "phoneNumber" : "956068715",
                "photoUrl" : "https://.../s96-c/photo.jpg",
            },
            "2xJxDomSYAMZO82nMu2egf0Vh0Z2" : {
                "displayName" : "Alberto Garcia",
                "email" : "agarcia@belatrixsf.com",
                "phoneNumber" : "956765726",
                "photoUrl" : "https://.../s86-c/photo.jpg",
            }
        }
    }
}

In order to fix that, we have to remove the registrations from the sessions node, and set them as a new independent node. That will allow us to avoid retrieving unnecessary data when we query for each session’s details, and let us keep a shallow structure. Moreover, if we want to display the information of each registered user, we can avoid several join queries by duplicating the users data in each registration node (see listing 2). Duplicated data can improve the performance by reducing the amount of complex queries. Nevertheless, that means that several locations will require simultaneous updates.

In the end, it reduces to the way an application is going to display the data, and keeps the balance between duplicated data and consistency.

Keeping consistency

Everytime you modify your duplicate data, it may get corrupted by concurrent access or by an error in one of the operations. In order to maintain consistency, Firebase provides two features: transactions and multi-path updates. The former can only operate over one root node at a time (e.g. sessions, or users in the example database), and if at least one of the operations is rejected, it will run again (multiple attempts). The latter compresses several operations in one atomic operation (all or nothing), and allows to operate over different root nodes (e.g. users, sessions, etc) at a time.

Listing 3: Transaction over the registration reference

registrationsRef.runTransaction(new Transaction.Handler() {
        @Override
        public Transaction.Result doTransaction(MutableData mutableData) {
            // Here goes your logic over mutableData (registrations reference)
            return Transaction.success(mutableData);
        }

        @Override
        public void onComplete(DatabaseError e, boolean b, DataSnapshot snapshot) {
            // Transaction completed
        }
});

Listing 3 shows the basic skeleton of a transaction. One advantage of transactions over multi-path updates, is that you get the current database values in the doTransaction callback. But, as mentioned before, it won’t work over different nodes, meaning that if you want to update the sessions node, you cannot use the same transaction to update the users or the eventInformation nodes at the same time. Differently, the code in listing 4 shows a multi-path update, executing in our example database, that is modifying two root nodes “/users” and “/registrations” at the same time.

Listing 4: Multi-path update

Map<String, Object> childUpdates = new HashMap<>();
childUpdates.put("/users/TPjbF78fKmdvRIC3Awzo03ygm692", null);
childUpdates.put("/registrations/.../TPjbF78fKmdvRIC3Awzo03ygm692", null);
databaseRef.updateChildren(childUpdates);

An extra tip

With Firebase, most of the application’s logic is in the client. However, at some point you may want to move some part of the code to the backend. This could be due to duplicated logic in the clients, or moving intensive tasks to the cloud, etc. In this context, Firebase Cloud Functions (which is still in Beta) lets you automatically run backend code in response to certain events triggered by the different Firebase features and HTTPS requests.

Imagine the following situation in the events application: We want to retrieve the amount of registrations for each session, in order to prevent them rising above certain capacity. While in relational databases we will probably execute a select count(*) query and compare the value with the capacity, in Firebase we can avoid such query and get the value right away. To make that happen we have to update a count value everytime a new user registers/unregisters to/from a session. Such count value will be a new attribute in each session.

Listing 5: Count registrations cloud function

exports.countRegistrations = functions.database.ref("/registrations/{sid}/{uid}").onWrite(event => {
  var sid = event.params.sid;
  var countRef = admin.database().ref(`/sessions/${sid}/count'`);

  return countRef.transaction(function(current) {
    if (event.data.exists() && !event.data.previous.exists()) {
      return (current || 0) + 1;
    }
    else if (!event.data.exists() && event.data.previous.exists()) {
      return (current || 0) - 1;
    }
  }).then(() => {
    console.log("Counter updated.");
  });
});

Using cloud functions we can define a listener in the backend that will update the count value everytime the registrations node is updated, see code in listing 5. We use the onWrite callback to hear create/update/delete events, and start an action right after that. The transaction conditions ensure that modifications (update events) will not change the value of the count attribute.

The string “/registrations/sid/uid” defines the path for the event listener, where the sid the uid fields between braces act as wildcards. Thus, our cloud function countRegistrations will hear all the database modifications in such a path. After that, it will get the sid which is the session key (the uid is the user’s key) and call a transaction to modify the counter attribute of such session (returns a promise). Finally, it will log a message “Counter updated.”.

Conclusions

Firebase database or near real-time databases works great, however, they are not completely free, and there are certain limitations that you have to take in mind. Besides that, you should think about how you are going to display your data before creating a structure for your database. Just think that the structure of the database should match your views. Furthermore, you have to avoid deep (nested) structures and balance duplicate data with consistency. Finally, even though you can implement a lot of your logic in every client (e.g. Android, iOS), sometimes it is better to implement it as a cloud function to avoid code repetition, or to move an intense procedure out of a mobile application.

Note

We found a problem while using client transactions in Android. Whenever we delete a value from the database using transactions, if there is a cloud function listening to such path (either onWrite or onDelete), it will not be called. That does not happen if we use multi-path updates. This may be a bug that could be fixed in future releases since, as we mentioned before, cloud functions are still in beta phase.

Leave a comment