Hierarchical data are everywhere, from product catalogs to blog post comments. A classic example is the tree of life, where kingdoms are subdivided into a hierarchy of phylum and class down to genus and species. What if you wish to store this data in a database table, which is inherently flat? Databases do not natively store hierarchies, so you need to work around that.
MPTT, or modified preorder tree traversal, is an efficient way to store hierarchical data in a flat structure. It is an alternative to the adjacency list model, which is inefficient. This post will cover the MPTT approach, examine its tradeoffs, then explore django-mptt, which is a package for adding MPTT to your Django models.
The MPTT approach adds a lft and a rght attribute to a model, which lets you easily determine parent-child relationships. See the example tree with lft and rght values below (GlobalCorp, for example, has a lft value of 1 and a rght value of 20). The dotted line in the image above shows the path taken to calculate child relationships within the tree.
The model attributes allow you to do SQL queries such as the one below to get all USACorp subsidiaries:
SELECT * FROM example_tree WHERE lft BETWEEN 2 AND 10;
I've used MPTT in two of my major projects at Caktus. A couple of model patterns which might suggest using a MPTT are:
- Many class names describing the same basic content type in an arbitrary hierarchy:
- Awkward containment structures where you are creating a bunch of linked models that are basically different words for the same model:
The MPTT approach is beneficial in that, with only a couple of fields you can determine an entire tree structure. Because of this economy, retrieval operations are efficient. You can very quickly query a tree to determine its relationships. The tradeoff is that inserts and moves are slow. If the structure of the tree is constantly changing, MPTT is not a good option because it needs to update many or all of the records. It also performs a whole table lock which prevents any updates to the affected table. This is obviously less than ideal if you have a heavily updated database.
The django-mptt project is a convenient way to incorporate MPTT into Django. It provides a base model, MPTTModel, which implements the following tree fields for you:
- level (indicating how deep in the tree a node is)
- tree_id (to identify tree membership for any given instance)
It also provides the following convenience methods which abstract these tree fields and help you to manage the tree:
- get_ancestors(ascending=False, include_self=False)
- insert_at(target, position='first-child', save=False)
- move_to(target, position='first-child')
Of these methods, get_root() and insert_at() are particularly helpful. Manually modifying lft and rght values is not a good idea, and insert_at() is a safe way to update the tree. I use get_root() all the time to double check or even short circuit child values. For example, if you have five product types, you could have five trees and specify all the product related information at the root of each tree. Then any child node could simply ask for its root’s values:
product = Products.objects.get(id=123) product.get_root().product_color
from mptt.models import MPTTModel, TreeForeignKey class Company(MPTTModel): name = models.CharField(max_length=255) parent = TreeForeignKey('self', related_name='client_parent', blank=True, null=True) is_global_ultimate = models.NullBooleanField() is_domestic_ultimate = models.NullBooleanField()
Using the tree
This example method finds the domestic headquarters for any given company in a tree.
def get_domestic_ultimate(self): """ If current company is flagged as domestic ultimate, return self. Otherwise iterate through ancestors and look for the first domestic ultimate. """ if self.is_domestic_ultimate: return self mytree = self.get_ancestors(ascending=True, include_self=False) for comp in mytree: if comp.is_domestic_ultimate: return comp return None
Setting up a Test Tree
class CompanyTestCase(TestCase): def setUp(self): self.globalcorp = factories.CompanyFactory(name="GlobalCorp", is_global_ultimate=True,) self.usacorp = factories.CompanyFactory(parent=self.globalcorp, is_domestic_ultimate=True, name="USACorp") self.companya = factories.CompanyFactory(parent=self.usacorp, is_headquarters=True, name="Company A") self.companyb = factories.CompanyFactory(parent=self.usacorp, name="Company B")
Testing the Tree
def test_tree_parameters(self): self.assertEqual(self.globalcorp.lft, 1) self.assertEqual(self.globalcorp.rght, 20) self.assertEqual(self.globalcorp.level, 0)) self.assertEqual(self.asiacorp.lft, 10) self.assertEqual(self.asiacorp.rght, 19) self.assertEqual(self.asiacorp.level, 1) self.assertEqual(self.globalcorp.get_descendant_count(), 9) self.assertEqual(self.usacorp.get_descendant_count(), 3) self.assertEqual(self.asiacorp.get_descendant_count(), 4)
One Last Gotcha
There is an incompatibility with django-mptt and GIS in PostGreSQL. If you are using django-mptt and CREATE EXTENSION postgis;, you can't use MPPTMeta attributes in your MPTTModels:
class MPTTMeta: level_attr = 'mptt_level' order_insertion_by=['name']
Adding these meta options won't do anything obviously wrong. It will cheerfully report an incorrect tree structure.
Storing Hierarchical Data in a Database by Gijs Van Tulder
Django treebeard what are differences between AL, NS, MP by Pedro Werneck