Nexedi was recently requested advice about which Free Software is currently suitable as Database engine in the context of entreprise applications or is able to replace legacy proprietary relational database. Our advice is: MariaDB, PostgreSQL and Cubrid in random order.
It depends on...
Before giving any advice, one should alway remember that performance of database server depends:
- application;
- numerous configuration options;
- data distribution statistics;
- operating system and especially filesystem type (ext4, XFS, FusionIO, etc.);
- server hardware, especially input/output and latency;
- know how of developers to optimise their software for a specific database engine.
In other words, it is very difficult to compare database server performance as such. If we only consider server hardware, one database may shine a solid state disks (SSD) and deliver poor performance compared to another one on regular disks. Know how is even more important. Engineers trained on one database may need some time (weeks to months) to achieve similar results with another database on which they may lack know how.
We thus prefer to compare database serves using real word success cases that exhibit their strength.
MariaDB
MariaDB has been deployed by the largest cloud companies to power significant portions of the modern Web. It is rumoured that Google and Facebook alone have deployed half a million MariaDB servers each to power their core services. Latest versions of MariaDB are even based on Facebook's MyRocks to provide further scalability.
Taobao - the Chinese equivalent of Amazon - uses MariaDB at its core for e-commerce applications and contributed MariaDB's multi-source replication system. Taobao's example is remarkable in the sense that the company not only uses MariaDB but actively improves it in a way that helps the entire community. And it also demonstrates that MariaDB supports high performance and high availabiliy in real world, a feature that is now officially supported with Galera Cluster.
Nexedi uses MariaDB at the core of ERP5, both as a relational database server and as a NoSQL server with NEO. ERP5 can produce 300.000 complex invoices in about 8 hours on a single, standard generic Xeon based server, leading to operating costs 3 to 4 times lower than traditional billing systems. Thanks to MyRocks, NEO (powered by MariaDB) is able to scale up and handle terabytes of data ingestion per day on low end servers with mechanical disks.
There seem to be very few limits to MariaDB thanks to so-called storage engines:
- with HandleSocket, it does NoSQL faster than most NoSQL databases;
- with Spider, it does sharding natively and can handle one million inserts per second (see also ScaleDB);
- with Connect, it acts as an ETL and integrates data from files (CSV, DBF, XML, etc.) or third party databases (ODBC, JDBC, HADOOP, etc.);
- with ColumnStore, it supports datawarehouse type table scans on big data sets;
- with Cassandra, it integrates directly with the NoSQL database;
- with Mroonga, it provides high performance full text search that also supports asian languages.
One nice feature in MariaDB is the ability to embed it as a library. This means that MariaDB can not only be used for large deployments on clusters but also as a standalone database for embedded systems, without having to allocate a dedicated process.
Another nice feature is the ability of MariaDB to self-repair in most cases of system crash. As one may know, virtual machines or baremetal servers may randomly reboot without prior notice. We have found that such incident happens between 0.1 occurrence per year per server in the best cases (Soyoustart, Online baremetal hosting) and 5 times per year per virtual machine (OpenStack based cloud providers). Some database servers require manual intervention in such cases. MariaDB usually does not.
MariaDB is officially supported by a company: MariaDB corporation. It is possible to sponsor MariaDB development and contribute to its roadmap through the MariaDB foundation (as Nexedi does) that pays the salaries of MariaDB's core developers, most of which are located in Europe;
PostgreSQL
PostgreSQL provides more advanced features than MariaDB.
This was the main reason why Nexedi initially chose MySQL (now MariaDB): we wanted to be sure that ERP5 could run on any database server, at least in theory, and thus designed ERP5 based on a limited SQL dialect. We are quite happy with our initial choice since we have later been able to port ERP5 to different databases in fairly short time (a week for an initial port).
Another reason of our choice was that - 15 years ago - PostgreSQL was quite slow (10 times slower on inserts) and did not support self-repair after system crash. Nowadays, MariaDB's SQL dialect is much more complete and PostgreSQL is much faster. Both PostgreSQL and MariaDB prentend to provide the best performance of the two.
Here is a list of PostgreSQL features that do not exist in MariaDB:
PostgreSQL provides similar features to those listed for MariaDB: clustering, sharding, external engines, etc. The only difference is that there is no obvious single source for support or development. PostgreSQL is a community of developers with a couple of companies such as 2ndQuadrant (Europe) or EnterpriseDB (USA) that provide commercial support and core development. Some view this as an advantage while others view this as an inconvenience. For example, there are at least 12 solutions for clustering in PostgreSQL. One will thus have to study and decide which one is most suitable and better supported.
Overall, Nexedi would consider using PostgreSQL in two cases: to migrate PL/SQL code or to benefit from extensive OpenGIS support. PL/SQL dialect in PostgreSQL seems to be a key factor in its adoption by French governement as a possible replacement of Oracle database servers on existing applications. A company called Dalibo (France) even developped a tool to automate the evaluation of migration costs and used it on several applications at the French Ministry of Finance. "Arrogant Oracle to PostgreSQL" migration seems to be a common story in the United States and now an official policy in Russia.
Cubrid
Even if few people know about it, Cubrid is worth considering. Cubrid was born at Naver (the equivalent of Google search engine in Korea) with the goal to replace Oracle and Microsoft SQL databases.
It was thus designed for the same kinds of high Web loads as those for which MariaDB is rumoured to be used at Google and Facebook, and officially used at Taobao. But Cubrid was designed in a company with strong Oracle background and should includes some features that make its adoption easier for engineers trained on Oracle products (ex. the CONNECT BY syntax). A lot of effort was placed on high availability, high performance, fault-tolerance, hot backup, query plan caching, etc.
The main success story of Cubrid is obviously Naver which replaced a significant share of their 10.000 Oracle servers with it.
Cubrid does not support PL/SQL. Instead, it supports Java Stored Procedures. Migrating Oracle PL/SQL code to Cubrid may thus require to convert PL/SQL code to Java. Some companies provide automated solutions for this but one should be careful and should not expect too much as with any transpiling approach. There is also an open source tool called P2JTRans which we did not try. And some companies provide PL/SQL to Java conversion services.
R&D of Cubrid is based primarily in Korea at Naver. Cubrid is also developed by Arnia (Romania) that provides commercial services on the behalf of Naver.
NoSQL
Any person who believes that NoSQL is faster than SQL databased should read this article first: Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server.
Most NoSQL databases are actually slower than SQL databases. Performance gain only comes from the absence of transaction support. Absence of transaction has been blamed for various incidents on real world systems, as described in NoSQL Meets Bitcoin and Brings Down Two Exchanges: The Story of Flexcoin and Poloniex.
We thus believe that in an enterprise environment, NoSQL is either of very limited use or can be implemented with better performance on top of a relational database. This is actually what we did at Nexedi:
- our NEO distributed transactional object store was implemented on top of MariaDB (NoSQL on top of SQL);
- we use KumoFS for distributed persistent caching, not for accounting transactions (NoSQL for very limited use);
- we use MariaDB directly for everything else (SQL).
In the article Why Postgres Should Be Your Document Database, Jim Fulton also explains how to use a relational database as a document store in a way that renders the use of NoSQL database irrelevant.
In the article MyRocks: A space- and write-optimized MySQL database, Yoshinori Matsunobu explains how MariaDB most recent storage was actually built on top of Facebook's distributed NoSQL storage engine. Nexedi currenlty uses MyRocks to power its big data applications based on NEO and to store large BLOBs. By precompiling SQL queries, we can eliminate the SQL interpretation time and achieve same performance as we would get with NoSQL type queries.
Conclusion
MariaDB is the primary choice of database at Nexedi because we prefer to stay on the same side as Google, Facebook, Taobao, etc. and because ERP5's modern design does not depend on stored procedures or complex SQL queries. We also currently lack resources to duplicate our scalability know how with a second database engine. Yet, we keep in mind the option to use other database engines because we highly value features such as fault-tolerance or hot backup that are supported natively by Cubrid.
We understand that both Cubrid and PostgreSQL are relevant solutions that can surpass MariaDB in numerous cases (just as MariaDB can also surpass Cubrid or PostgreSQL in numerous cases). We also understand that any feature available in MariaDB is also usually available in PostgreSQL. We therefore recommend any company willing to adopt Free Database Software to try to use at least two of the three contenders.
We also recommend not to use dedicated NoSQL databases for enterprise applications. What we mean here is that NoSQL can be usually implemented with better performance on top of the same storage engine that powers most modern SQL databases. In other words, a modern SQL database should be able to achieve both SQL and NoSQL jobs with high performance, without losing transaction support that is essential for enterprise applications.