Before I start, I'd like to repeat what many people know nowadays: Erlang is an absolutely marvelous language. I'm extremely grateful to Ericsson and the whole Erlang community, having no relevant complaints with respect to Erlang and the OTP (except for logging, but you can easily find long discussions on the subject, so I'm not talking about it here). I won't try to give you any advice with respect to the language itself, since there's already great material available here and there to lead you to a successful experience with Erlang.
Mnesia is also a great tool for developing distributed, scalable storage systems. It gives you data partitioning, replication, ACID transactions and many more important characteristics that just work and - you can bet it - are stable. I'm sure you can find - as I did - lots of good advices on using Mnesia, so I'll try not to repeat too much of what other people have already said. Obviously, I'm also not giving a tutorial on the basics of Mnesia here: I assume you are going to study elsewhere in case you decide to actually use Mnesia. Here is a good quick overview of table fragmentation (partitioning), for example.
A last warning: some of what I'm saying below may be wrong (you know... I didn't develop Mnesia) or might just not apply to every Mnesia database deployment, since my experiences are mostly based on a system with the following characteristics:
- Distributed, running on several servers.
- Employing fragmented tables, distributed across several servers.
- Employing cloned (replicated) fragments.
- Holding high volumes of data that don't fit in RAM.
- Serving a big number of read requests per second (and also some write requests).
Mnesia's table fragmentation uses linear hashing
You decided to create a fragmented table and distribute the data among 3 servers, so that about 1/3 of the data goes to each server. You then created the table with 6 fragments and placed 2 fragments on each of your servers, what gives you uniformly distributed data, right? No!
This is an information that I could only find in the source code (and by "source code" I don't mean "comments" ;-)): Mnesia's default hashing module (mnesia_frag_hash) uses linear hashing to distribute keys among fragments. That hashing scheme does a perfect job to make the number of fragments cheaply expansible and shrinkable, but it doesn't distribute data well if the number of fragments is not a power of 2.
In the example above, with 6 fragments, you are very likely to have a data distribution that resembles these quantities of keys per fragment: (X, X, 2X, 2X, X, X). If you distribute those 6 fragments among 3 servers naively, you may end up having one server with 100% more data than each one of the other!
The best solution is probably to create a bigger number of fragments that is a power of 2 and to then distribute those fragments as evenly as possible. If the math still doesn't work for your pool, you can think of implementing another hashing module (Mnesia supports that), possibly with consistent hashing.
Load-balancing read operations: do it yourself
You have created a foo table (or table fragment, for that matter), with copies on servers bar and baz, and you are reading that table's records from a number of different Erlang nodes in the same Mnesia pool. You read data by calling mnesia:activity(..., ..., ..., mnesia_frag), so that Mnesia transparently finds the data on one of the servers that hold it.
The example above is totally functional, but Mnesia won't load-balance those read operations: given a steady Mnesia pool, each node will always read the foo table from the same server (always bar or always baz). I've seen cases where all of the servers would read the foo table from the bar node (except baz, since local tables are always read locally, it seems). You can check where node bam is reading the foo table from by calling mnesia:table_info(foo, where_to_read) on node bam.
If you need the read operations to be well distributed among all clones of each table or fragment, you need to explicity set the where_to_read property on each node of the Mnesia pool, for each remote table it needs to access. I like to do this by selecting a random node from the list of hosting nodes (and repeating that process often, on each node, for each remote table), but you could want to choose some other strategy (for example: you might have 3 clones of a table and decide to have all the pool reading data always from the same 2 clones - who knows?). The important information here is that Mnesia will certainly not do what you want automatically.
To be continued...