- No data will need to be moved to any fragment that is not the new one.
- Only the keys in one of the old fragments have to be checked (and possibly moved to the new fragment).
But, there are also two characteristics that you might not want for your application:
- Data is not very well distributed among the table fragments.
- If you are using disc_only_copies and one of your fragments is reaching the 2 GB limit, maybe you will need to add a lot of fragments to make it shrink (this is, in a way, a consequence of advantage 2, above).
I have implemented that, in a module called mnesia_frag_chash, along with a modified version of Erlang's gb_sets module. The key functionality is the geq_iterator that Richard O'Keefe provided in this post. I named the modified module ok_gb_sets.
I'm creating 100 entries for each fragment, calculating a hash value for each entry and then storing each one in the circular hash table for consistent hashing (which is actually a tree - an ok_gb_sets). To find the fragment for a specific key, all I have to do is calculate a hash value H for the key, find the first element whose hash value is greater or equals H and than pick that entry's fragment (actual implementation is just a little bit more complicated, as we have to avoid collisions):
You could use a smaller number of entries per fragment: that would make the tree smaller and you would need to check a smaller number of fragments when adding a new one. On the other hand, creating more entries per fragment would provide even better distribution of data between fragments.
The hash state is defined as follows, where chash_table is the ok_gb_sets and n_frag_pieces is the number of entries created in the set for each fragment:
My main concern is about the size of chash_table. I need to understand Mnesia's code better to be sure that the use of this hash_state will not cause performance problems.
Another problem introduced is that, when adding a new fragment, the keys in several fragments (100, in the worst case, but that can be tuned, as I said above) will have to be scanned for Mnesia to find out which ones need to be moved to the new fragment. Although the total amount of data to be moved will still be small, the number of fragments locked and the amount of verified keys will be considerably bigger, so beware.
If my first concern turns out to be unimportant and the second one is not a big problem for you, you should consider using this mnesia_frag_chash, as it solves the two problems I discussed in the beginning of this post (I have executed some simple experiments and it seems a good job is done on distributing the data).
Full source code (just two files) is here.