From local storage To-do list to Fractal, Peer-to-Peer Habit Tracker
In part 2 I described how I experimented with a single paged application for persisting and visualising personal habit information. In this post I will bring you up to date with the state of the project.
How much did SOLID help to salvage?
I spent some time learning about the distributed p2p architecture as implemented by Holochain (including starting to learn the Rust programming language). Although, ultimately, hApps (Holochain apps) will be able to be written in any language that compiles down to WebAssembly, currently Rust is the only option. My web API would be useless with the new p2p architecture at any rate.
I would have to rebuild the backend completely. Since the frontend was written in Mithril, using JSX, I could reuse much of the UI. I first decided to address the state management issues by using Redux and typesafe-actions to allow easy management of loading and error states, while aiding organisation in the reducer. Since learning Rust, moving to Typescript on the front end made sense as I am then able to match interfaces.
I feel a List coming on
Since the bottom of each fractal habit structure could potentially be a list (which I omitted from the last version), I thought it would be a good place to start… yes… another Todo list.
This time the whole process was driven by tests (following the canonical TodoMVC examples), using Jest as a test runner and React Testing library. One of the main lessons I learnt from the previous version was that I never want to experience not testing again! Too much time was spent chasing bugs in the UI.
After setting up local models on the frontend and testing everything upto the ‘Save’ button (i.e. I was ready to persist once more), I reached this error:
It was time to learn about Distributed Hash Tables, anchors, and all things Holochain.
Persisting on Holochain
First, a litte background:
- A zome (short for chromosome) is like a microservice that makes up part of a hApp. User profiles, for example.
- An hApp’s code is made up of zomes, compiled into WebAssembly, packaged into a DNA. DNA is like “the rules of the game” when it comes to app logic and validation.
- A DNA instance runs on a ‘conductor’, a process that runs — on your machine or on Holoports (which act as distributed hosts for hApps, proxying for other user agents on the network and allowing p2p connections even from behind firewalls/routers).
- A conductor conducts, both in the sense that it orchestrates the installation/running/validating of different hApp DNA instances, and that it allows calls to public zome functions to pass through it via the conductor API.
- Each hApp instance runs in a Cell on the conductor, and creates a private network for all other peers on Holochain that are using the exact same DNA (the same set of rules).
- Data for each agent on the private network is stored privately on their ‘source chain’ — a hashed sequence of ‘elements’ containing a header and possibly an entry (the content).
- Public/private key encryption is used to sign the elements and verify the authenticity of the data to other peers.
- Public data can be made and shared with other agents on the network.
- Public data is stored in a Validating DHT (Distributed Hash Table).
Why I Care
It’s a pretty different way of doing things. That’s why.
As somebody creating an application where an objective is to share personal data, I would like to empower users to take sovereignty of that data and share it as much or as little as they like, with whomever they like, without having a third party intervene.
What This Means for My Design
This means I have to store my data on a DHT. As mentioned in the Holochain docs, one issue with having a DHT is it is trickier than finding data than in a relational database.
This creates a chicken-and-egg problem. In order to retrieve a piece of data that matters to you, you need to know its address. But you can only know the address if you can calculate it from the data
Thankfully, having used Shopify’s GraphQL API in the past, I am not new to the idea of a graph database. It also makes my design descision for how to store hierarchies of habits very straightforward. Path enumeration is the obvious option…
Comparing Hierarchy Storage Models
Because I need to create a graph of Anchors (an entry point to the graph) and Links (edges between entry nodes), to access the data in the DHT. How does this model help to store hierarchies? How does it compare with the model I was using with a relational persistence layer — Nested Sets?
Nested Sets, or “Modified Pre-Order Tree Traversal”
This example illustrates how the first version of my app stored its habit hierarchies. To query a subtree, e.g. to find all descendants, is as easy as selecting nodes that have ‘LFT’ and ‘RGT’ attribute values between those of my target ancestor node’s.
The main drawback is that insertions are expensive operations, as you may have to update all of the LFT/RGT values of all tuples (e.g. if you are adding a new parent to the top of the tree).
Path Enumeration, or “Materialized Paths”
This image shows how Holochain allows organisation and retrieval of data from the DHT:
We can pick out a simple hierarchy to demonstrate path enumeration: Each genre comes under the umbrella all genres.
Here, an anchor to link to musical genres has been made — a simple string “_all_genres_”, which can be stored as an entry in the DHT. It can easily be remembered and its hash used to connect to other entries.
By creating links between elements of hashed data in the DHT, and using an optional tag (e.g. using the string “#folk”) then I can allow filtering of elements when I come to fetch data.
Since we can access anything tagged folk, hawaiian or ukelele from the “_all_genres_” anchor, by making a link and filtering with a tag, we have an easy path to each genre which can respectively be defined as:
If I were then to add a more specific sub-genre and create a link to it as a child of one of the existing tagged genres — such as baritone ukelele — it would have the path “_all_genres_/#ukelele/baritone_ukelele”.
One of the benefits of a graph database is that a query is designed to return only the information that is actually needed. Anyone who has used a GraphQL API will know the knock on effect for constructing a request — instead of asking for whole tables of data you ask for specific attributes (properties of the nodes you are interested in).
By redesigning the data persistence layer to fit a graph model, I noticed a (now glaring) inefficiency in my original version of HabitFract. When querying my web API, I would ask for a table of “HabitDates” — records in a relation that uses two foreign keys to create a many-many relationship between “Habits” and “Dates” — the yellow, orange and purple relations in the entity relationship diagram:
After every calendar day completed, when a user accessed the app, I would trigger a lifecycle hook to insert records into the ‘date’ and ‘habit_date_completed’ relations seen above. The ‘habit_date_completed’ completed value defaulted to ‘false’. It was a naive approach:
Imagine if the user has data stored for 100 different habits, and they wait a month before accessing the app again. The page reload will be waiting on about 31 + 3100 new records to be inserted, just to record the fact that they haven’t done anything!
The important thing to realise is that there is no need to record a false value for an action being completed on a date. That is the obvious default position of all actions before we do them, and so ‘gaps’ in the calendar can be inferred from the ‘completed = true’ days.
Simplifying the Schema
The above realisation, combined with the more modular microservice based architecture of Holochain has influenced me to concentrate on the general concept of recording *When* something happened.
There is already another hApp under development called ‘Where’, to do with locating where people are (not just geographically, but, for example, emotionally too). You can see it being trialled and discussed here.
To make my habit tracking logic reusable, I will break it up into a smaller project called ‘When’: See the github repo with the design document which includes specifics about the implementation of the concepts discussed in this article.
This means I can cut down the schema to 3 basic models: a thing that is done, (like a Todo list being completed), a Habit (a record of that action over time), and the specific instances in time when the action was completed (Red Dots).
Part 4 of this series will go into more detail about implementing “When”.