Ulf Wendel

Fooling around with MySQL JSON full-text indexing

2016/08/19 - by admin - 0 comments

MySQL 5.7/8.0 does not allow creating a full-text index on a JSON column. Although a full-text index would be useful – unstructured data deserves unstructured search. I am unaware o any workaround of this limitation that does not require storing the JSON data twice. Then I started messing around, allowed indexing of JSON columns, saw great execution times but eventually followed MySQLs’ decision to call it a day. Friday afternoon rambling on indexing and easy hacking…

Why full-text: JSON indexing limitations and pitfalls

The current version of the MySQL Document Store is not yet optimized for performance. When we started this effort, we quickly gathered a long list of features that we considered a must have prior to any serious performance tuning. The teams focussed building infrastructure and delivered many yet, not all their ideas. Unlike in the past I’m really relaxed about the future ahead. But we should explain the properties of our todays solutions and be open about their limits. JSON indexing has some limits.

The primary pattern for indexing JSON columns is extracting values into a generated column and indexing the generated column. There’s pitfall I missed to mention in all previous blog posts: charsets. The MySQL JSON SQL data type is using the charset utf8mb4. Now, check my typical example table definitions:

CREATE TABLE products (
  product_id int(11) NOT NULL AUTO_INCREMENT,
  product json DEFAULT NULL, <-- utf8mb4
  facet_title varchar(127) <-- charset?
    GENERATED ALWAYS AS (
      json_unquote(
        json_extract(
          product,
          '$.title
        )
      )
  ) STORED,
  PRIMARY KEY (product_id),
  FULLTEXT KEY idx_ft_title (facet_title)  
) ENGINE=InnoDB 
  AUTO_INCREMENT=50001 
  DEFAULT CHARSET=latin1 <-- latin1


What may be the charset of the facet_title column? Correct, it is latin1:

SELECT CHARSET(facet_title), CHARSET(JSON_UNQUOTE(JSON_EXTRACT(product, "$.title"))) FROM products LIMIT 1;
+----------------------+---------------------------------------------------------+
| CHARSET(facet_title) | CHARSET(JSON_UNQUOTE(JSON_EXTRACT(product, "$.title"))) |
+----------------------+---------------------------------------------------------+
| latin1               | utf8mb4                                                 |
+----------------------+---------------------------------------------------------+


If, for some reason, you need the charset of the generated column to be utf8mb4, as in the JSON, then change the charset for the column or the entire table.

ALTER TABLE 
  products
MODIFY 
  facet_title VARCHAR(127) CHARACTER SET 'utf8mb4' 
  GENERATED ALWAYS AS (
    JSON_UNQUOTE(JSON_EXTRACT(product, "$.title"))
  ) STORED

Stored… why stored?

Generated columns must be materialized (stored) to support full-text indexing. Materializing a value extracted from a JSON document effectively means duplicating data on disk: slower reads, needs more space in the buffers, … There seems to be an issue combining a stored generated column and a virtual generated column – I got wrong query results and filed a bug. The workaround is to materialize all generated columns but that takes away one of their biggest benefits.

And, if you deal with unstructured data how do you know which JSON fields to extract?

Jippie – full-text index on JSON

I turned to my remote friend GCC and together we convinced MySQL to cut one or the other safety belt to have some fun. MySQL agreed that there is no fun without risk, agreed that a backtrace is not the end of the world and gave me a full-text index on a JSON column:

ALTER TABLE products ADD FULLTEXT idx_ft_product(product);
Query OK, 0 rows affected (34,09 sec)
Records: 0  Duplicates: 0  Warnings: 0

The table products now two full-text indicies. One on the generated column facet_title and one on the entire JSON document. It now supports “unstructured string search” with the expexted performance characteristics. Full-text search is quick, and results are correct:

SELECT COUNT(*) 
FROM products2 
WHERE MATCH(product) AGAINST("Beach"); <-- FT on JSON
+----------+
| COUNT(*) |
+----------+
|       48 |
+----------+
1 row in set (0,00 sec)

SELECT COUNT(*) 
FROM products2 
WHERE MATCH(facet_title) AGAINST ("Beach"); <-- FT on generated column
+----------+
| COUNT(*) |
+----------+
|       48 |
+----------+
1 row in set (0,01 sec)

SELECT COUNT(*) 
FROM products2 
WHERE 
  JSON_UNQUOTE(JSON_EXTRACT(product, "$.title")) 
  LIKE "%Beach%"; <-- no index
+----------+
| COUNT(*) |
+----------+
|       48 |
+----------+
1 row in set (3,98 sec)

WOW? No. MySQL is using safety belts for good reasons. I for sure broke a lot without knowing – or bothering much.

One of the things that go wrong here is having the default and built-in full-text parser working on JSON. The InnoDB full-text parser is given a char* requested to make sense of it. The char* holds the binary representation of JSON that MySQL uses for on-disk storage. I have no idea which words ended up in the full-text index.

The MySQL JSON binary presentation

Assuming the InnoDB full-text parsing code can deal with words coming from JSON columns, and I would be surprised if there was much to be done to make this possible, then the task boils down to write a full-text parser plugin for JSON column values. The MySQL manual page on writing a full-text parser plugin is not too long – why not give it a try.

The binary format of JSON columns is documented in WL#8132 (see also related WLs). Couple of type flags, a little “dictionary” for object and array columns, nothing too fancy going on there. But there is, of course, code in the server to do the job. The server has three classes that could come handy: Value from json_binary.h, Json_dom from https://github.com/mysql/mysql-server/blob/5.7/sql/json_dom.h”>json_dom.h and Json_wrapper found in json_dom.cc.

Class Value is about serializing values into the binary JSON reprensentation respectively unserializing. Json_dom provides an API to manipulate Values. Json_wrapper lets you operate on Value or Json_dom without knowing which is being used. And, the “DOM” is what the JSON_* SQL functions use to to perform their work. Both Value and Json_dom are pretty much self-contained.

But, oh – not again: THD. Json_dom needs THD for memory allocations and error/warning logging. THD stands for thread handler. Traditionally, MySQL was using one thread per client connection to perform all the work. THD carries a lot of useful information for doing so. And, lots of code requires passing THD. As long as your code is part of the traditional processing queue, everything is fine.

But plugins are not. Plugins may not have access to THD, and if they figure out a way to access the THD, it is not always a recommended way. My hacked MySQL then told me: enjoy your weekend, I’ll segfault now. I respect that decision.

I’ve learned two things. For one, I think we should allow full-text indexing of JSON columns in some future version. For two, managers should not constantly wipe their development teams to their limits. It is probably no big deal to make the JSON processing classes available to plugins. In fact, Json_binary seems so already only Json_dom is missing. But given the very tight schedules there is often no time to go the extra mile. Good news is that the JSON processing classes should be available in UDFs with no hassle and that there some cool refactoring is done in the server to make it more modular long term.

Maybe, I should stop talking about desired features myself… Whatever. Next for me is either UDFs or the cool new plugin services part of MySQL 5.7 pretty much nobody has talked about because everybody talks about the use case for the end user.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

MySQL Document Store: unstructured data, unstructured search

2016/08/17 - by admin - 0 comments

Storing documents is convenient: no need to define a schema up-front, no downtime for schema changes, no normalization, no slow joins – true or not, you name it. But what about search if you do not know how your data is structured? For example, the famous SFW-construct requires listing the columns to search: SELECT … FROM … WHERE some_column = ‘Jippie’ . Given that JSON data can have no schema how to write a query without knowing any field names, where is SELECT … FROM … WHERE * = ‘Jippie’? JSON_SEARCH() gets you started but there are gaps if you think about it for a minute.

Continue Reading →

Use the index, JSON! Aggregation performance (Elastic, MySQL, MongoDB)

2016/08/12 - by admin - 0 comments

Use the index, Luke!? No: use the index, JSON! When hunting features the X DevAPI should consider I got shocked by some JSON query execution times. Sometimes the MySQL Document Store runs head-on-head with MongoDB, sometimes not. Sometimes the difference to Elasticsearch is huge, sometimes – just – acceptable. Everytime the index rules. Use the index, MySQL Document Store users!

Comparing Elasticsearch, MySQL and MongoDB comes with a touch. Elasticsearch is the most popular enterprise search engine closely followed by Solr. MongoDB is the #1 document store. And, MySQL tries to throw it’s owners flagship database from the RBDMS throne to gain the top position. The only thing they have in common is being able to store and query JSON documents to serve (orthogonal) web developer needs. Some believe the RBMS and Document model can be blended in one store whereas others advocate using specialized systems. Given how different the focus of the three systems is any performance comparison is questionable. However, as the quoted experts noted performance matters when choosing a system: you can often work around data model properties, you cannot overcome performance deficiencies. All those warnings given why not share some observations and basic tipps…

Continue Reading →

Aggregation features, Elasticsearch vs. MySQL (vs. MongoDB)

2016/08/11 - by admin - 0 comments

To make the MySQL Document Store primary programming interface, the X DevAPI, a success we should provide building blocks to solve common web development tasks, for example, faceted search. But is there any chance a relational database bend towards a document store can compete around aggregation (grouping) features? Comparing Elasticsearch, MySQL and MongoDB gives mixed results. GROUP BY on JSON gets you pretty far but you need to watch out for pitfalls…

Scale-out, sharding, schema-less, big data all hiding a giant developer task: giving user access to semi-structured data in near real-time. This can mean providing user driven full text search, facet search or analytics to suggest topics to users, e.g. recommender systems. Elasticsearch is a popular choice here. As much as Elasticsearch differs from both MySQL and MongoDB and comparisons are not “fair” it is interesting to compare the Elasticsearch aggregation framework functionality with the other two. What’s there that MySQL could learn, which feature can be done using SQL, which comparable SQL is so tricky that it deserves to be wrapped in the DevAPI? An overview based on Elasticsearch 2.3.5, MySQL 5.7/8.0 and MongoDB 3.2.

Elasticsearch aggregations overview

The query languages of the three systems are quite different. The technology is different, the concepts differ and the terminology differs. The Elasticsearch aggregation framework speaks of three building blocks: metrics, buckets, pipelining. Because my main focus is on comparing the feature set, I’ll use SQL to describe the terms. The SQL examples help to stay focussed. Actual Elasticseach query language examples will be given after the introduction.

The Metric aggregations compute metrics over a set of documents ;-). Examples of metrics given as MYSQL SQL function names are: MIN(), MAX(), STDDEV(), SUM() and so forth.

   single value metric
           |
           v
SELECT MIN(price) FROM products


Elasticsearch features metric aggregations that produce one metric/value or multiple metrics/values in one go.

       multi-value metric
          |          |
          v          v
SELECT MIN(price), MAX(price) FROM products

Bucket aggregations partition the data set. They describe rules for building different sets of documents into which documents then “fall” into. In its simplest form, think of a SQL GROUP BY criteria in a SQL statement. A statement like SELECT * FROM products GROUP BY size partitions all products by their size and puts rows with different size column values into different “buckets”.

                                   
                                    partition, bucket, ...
                                             |
                                             v
SELECT size COUNT(*) FROM products GROUP BY size 

+----------------------+
| size     |  COUNT(*) |
+----------------------+ 
| S        |   123     | <--- set of rows with size = S
| M        |   456     |
| ...      |  ...      |

Pipeline aggregations operate on the result of an aggregation instead of the original data set. Imagine you have a day-trade online shop and want to know the average price of all products depending on the day they have been stocked. In SQL you could write:

SELECT in_stock_since, AVG(price) FROM products GROUP BY in_stock_since


Additionally, you are interested in the average price across all the results the aggregation has produced. You could either issue a second query or “pipe” the result into a second aggregation. Here’s a SQL statement to illustrate the “pipe into”:

SELECT AVG(_avg_price) FROM (
  SELECT AVG(price AS _avg_price) AS _avg_price
  FROM products
  GROUP BY in_stock_since 
) AS _sub

Beats SQL: many aggregations in one go

A neat thing about Elasticsearch is that it lets you compute the result of both of the above two aggregations using one query. Elasticsearch can return the average price per day and the average over all them in one result set. I am unaware of a MySQL SQL statement that could do the same. Something similar can be achieved in SQL using a UNION ALL. This has two drawbacks. First, EXPLAIN hints that MySQL will scan the underlying data set twice and fail to optimize the query much. Second, the results from the two SELECT statements must be “sequeezed” into one row result set. Because the two SQL statements produce different columns additional columns have to be added to them to make storing their columns in one combined result set possible. On an aside: as long as MySQL lacks JSON aggregation functions similar to GROUP_CONCAT it is impossible to overcome the column issue by returning a nested JSON document (awareness of a problem is the first step towards a solution…).

  SELECT 
    in_stock_since, 
    AVG(price AS _avg_price) AS _avg_price
    NULL AS _avg_price_total
  FROM products
  GROUP BY in_stock_since 
UNION ALL
  SELECT
     NULL AS in_stock_since, 
     NULL AS _avg_price,
     AVG(_avg_price) AS _avg_price_total
  FROM (SELECT 
          in_stock_since, 
          AVG(price AS _avg_price) AS _avg_price
          NULL AS _avg_price_total
        FROM products
        GROUP BY in_stock_since) AS _sub 


Executing multiple aggregations as one query is not limited to pipelined aggregations. Any kind of aggregation can be used in a query. And, any number of them may appear in a single query.

     
(My)SQL: multiple aggregations glued together
      |
      v  
  SELECT ... GROUP BY ...      < --- bucket aggregation 
UNION ALL 
  SELECT MAX(...) ...          <--- metric aggregation 
UNION ALL
  SELECT ... FROM (SELECT...)  <--- pipeline aggregation 
UNION ALL
  ...                          <--- additional aggregations

DevAPI opportunity: go beyond SQL/CRUD

One of the things MySQL should learn from Elasticsearch is the ability to run an arbitrary number of independent aggregations on the same data set (ElasticSearch: index, MySQL: table/collection, MongoDB: collection) using only one query. Use case? Faceted search

Possible approaches to solve the task include optimizing UNION ALL handling, giving users access to lower level aggregation features to write queries that break out of SQL limitations or exapnding commands to multiple SQL statements. All these things can easily be done “behind the scenes” and be hidden from the users. The X Plugin and the X DevAPI are perfect verhicles to shield users from the very details and the actual SQL. Technically, the song I’m singing is the same MySQL has been singing for years: MySQL server plugins make it possible to innovate on the client side independently of server development constraints.

What’s new in MySQL world is the availability of a full stack (AKA MySQL Document Store) to execute on the idea.

Component Options to innovate
Application  
 
X DevAPI Higher level query API
|  
MySQL drivers Pre-parse into command tree, translate command into series of SQL statement and aggregate results…
|  
X Protocol Pipelining, send from server to client, …
X Plugin Rewrite SQL, use lower-level server data access APIs over SQL, …
MySQL Server  

What’s missing in MySQL world are ideas what to do with the new stack – time to be the bad boy talking about features missing in MySQL.

Elasticsearch aggregation query syntax

With some key Elasticsearch terms and concepts explained, first comparisions with SQL made, its time to show the basic syntax of an Elasticsearch aggregation query. Elasticsearch has an HTTP query interface. All queries can be written as JSON documents and be send to the search engine via HTTP. The basic structure of an aggregation query is as follows:

"aggregations" : {
    "" : {
        "" : {
            
        }
        [,"meta" : {  [] } ]?
        [,"aggregations" : { []+ } ]?
    }
    [,"" : { ... } ]*
}


The aggregation part of a search query is using the key “aggregations”, or short “aggs” in the query document:

{ <--- query document
  "aggs": { <--- aggregation part
...
  }
}


A single query can use one or more aggregations. Each aggregation has a user-defined name:

{
  "aggs": {
    "materials": { <-- aggregation using user-defined name
    }
  }
}


It follows the definition of the type of the aggregation and all relevant parameters for it:

{
  "aggs": {
    "materials": {
      "terms": {  <-- aggregation type
        "field": "material" <-- parameters
      }
    }
  }
}


The terms aggregation is a multiple bucket aggregation that matches GROUP BY field. Results will be grouped by unique values found in the field. Queries are issued again an index. Think of an index as the counterpart of a table/collection in MySQL.

# curl -XGET '127.0.0.1:9200/products/_search?pretty' -d@aggs.json

Beats SQL again: aggregation and search results in one go

When the above query is executed, Elasticsearch beats SQL once more. Elasticsearch will return all documents matching the query (no filter set) plus the result of the aggregation. The SQL counterpart requires two queries. Ignoring some pagination details, the SQL equivalent of the above is:

SELECT * FROM products
SELECT material, COUNT(*) FROM products


The need to use two queries has the afore mentioned downsides: potentially two round trips from the client to the server, potentially scanning the same data stream twice. The issue of two round trips could easily be solved in the MySQL Document Store by having the X Plugin execute the two queries locally and combine the results before returning them to the client, or, maybe, using X protocol pipelining. But the issue of having to scan the data stream twice is a SQL limitation and potentially a server matter.

If one is not interested in the search results but only the aggregation results, then one can set the desired search result size to zero in the Elasticsearch JSON query document.

{
  "size": 0,  <--- skip search results
  "aggs": {
    "materials": {
      "terms": {
        "field": "material"
      }
    }
  }
}

# curl -XGET '127.0.0.1:9200/products/_search?pretty' -d@aggs.json
{
  "took": 89,  <-- 89ms / 0.089s runtime
  ...
  "hits": {
    "total": 1000000, <--- 1e6 documents scanned
    ...
    "hits": [  <--- do not return any search results    
    ]
  },
  "aggregations": {
    "materials": { <--- material aggregation 
      ...
      "buckets": [
        {
          "key": "wool",
          "doc_count": 167074
        },
        {
          "key": "cotton",
          "doc_count": 166950
        },
        ...
      ]
    }
  }
}

Metric aggregations compared

Most but not all Elasticsearch metric aggregations have direct counterparts in both MySQL and MongoDB. For example, all systems can compute the average value of an aggegrated numeric field. While the syntax differs there are no groundbreaking differences in the functionality.

Avg Aggregation
Elasticsearch
{
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "price"
      }
    }
  }
}

MySQL
SELECT AVG(price) FROM products

MongoDB
[
  {
    "$group": {
      "_id": null,
      "val": {
        "$avg": "$price"
      }
    }
  }
]

There are two metric aggregations that are hard to emulate in MySQL: percentiles and top hits per group. Proposals to calculate percentiles in MySQL using SQL include using GROUP_CONCAT or SQL session variables. Either approach results in SQL better not shown. Likely, the problem could be easily solved using an UDF respectively adding a grouping function to stock MySQL.

Aggregation Elasticsearch MySQL MongoDB
Avg Yes Yes Yes
Cardinality Yes (Sample based) Yes (Exact) Yes (Exact)
Extended Stats Yes StdDev bounds missing Variance, StdDev bounds missing in aggrgation framework?
Geo Bounds Yes for future blog post
Geo Centroid Yes for future blog post
Max Yes Yes Yes
Percentiles Yes Complex SQL or UDF Complex
Percentile Ranks Yes Complex SQL or UDF Complex
Scripted Yes No Map-Reduce
Stats Yes Yes Yes
Top Hits Yes Complex Unknown (No)
Value Count Yes Yes Yes

Beats MySQL: Top hits – SQL:99 LATERAL

Calculating the top hits per group is a functionality that should be highly valuable when building a facet navigation. SQL:99 contains a SQL feature called LATERAL which could be used for the task. Unfortunately MySQL does not support it. The LATERAL keyword gives a sub-select access to FROM clauses that appear before it. It is often described as SQL’s foreach loop. It best explained at the example. All examples so far assumed a collection/index of product documents respectively a products table like:

+-------------------+---------------+------+-----+---------+----------------+
| Field             | Type          | Null | Key | Default | Extra          |
+-------------------+---------------+------+-----+---------+----------------+
| product_id        | int(11)       | NO   | PRI | NULL    | auto_increment |
| title             | varchar(127)  | YES  |     | NULL    |                |
| size              | varchar(255)  | YES  |     | NULL    |                |
| material          | varchar(255)  | YES  | MUL | NULL    |                |
| usage             | varchar(255)  | YES  |     | NULL    |                |
| gender            | varchar(255)  | YES  |     | NULL    |                |
| price             | double        | YES  |     | NULL    |                |
| in_stock_since    | datetime      | YES  |     | NULL    |                |
| description       | varchar(1024) | YES  |     | NULL    |                |
+-------------------+---------------+------+-----+---------+----------------+

Let’s try fetch the five most expensive products (fabrics) for each material (cotton, silk, ..) by help of LATERAL. We will build it up of two simple SQL queries. The grouping comes first:

SELECT material
FROM products
GROUP BY material

A query to fetch the five most expensive products made of cotton is straightforward as well:

SELECT product_id 
FROM products
WHERE material = 'cotton'
ORDER BY price DESC
LIMIT 5


The two queries now need to be combined in such a way that for each facet _material found by the first query, the second query is invoked with the material found to compute the top five hits. The second query needs access values produced by the first query. And, this is what LATERAL is about. It gives a subquery access to the results found by a previous query. A subquery does not execute in isolation but can break out of its isolation. The general pattern goes:

SELECT          <--- for each 
  inner.*
FROM  outer     <--- result produced (e.g. distinct material)
  ...
  JOIN          <--- add results 
  LATERAL       <--- allow access to previous results/FROM 
  (
     SELECT ... FROM inner <--- run a subquery to add results
     WHERE 
      inner.field = outer.result <--- based on outer query results
  ) AS inner


Because I have neither Postgres, Oracle or any other database featuring SQL:99 LATERAL installed I haven’t check the following SQL but it should be along the lines of the actual SQL:99 to solve the task. It is a little more complex to satisfy GROUP BY constraints (see also this Postgres example):

SELECT 
  top_hits.*
FROM 
  (SELECT material FROM products 
  GROUP BY products) AS materials
JOIN LATERAL
  (SELECT product_id, price, material
  FROM products WHERE material = materials.material
  ORDER BY price DESC
  LIMIT 5) AS top_hits

If MySQL would feature LATERAL, then the DevAPI could make a top hits feature available using a syntax close to SQL, or a function based variant of the Elasticsearch query document, whatever turns out more powerful and convenient when writing application code. Here’s the Elastic search query to find the top five products grouped by material:

{
  "aggs": {
    "materials": {
      "terms": {
        "field": "material"
      },
      "aggs": {
        "top-tag": {
          "top_hits": {
            "sort": {
              "price": "desc"
            },
            "size": 5
          }
        }
      }
    }
  }
}


On an aside, the Elasticsearch query is also an example of nesting aggregations. The data stream is partioned by material and then further processed using a nested aggregation for the top hits. Judging from the few Elasticsearch queries I examined, nesting can often be mapped to SQL either using derived tables (SELECT nested FROM (SELECT outer)) or using a WHERE-clause.

Bucket Aggregations

Bucket aggregations partition the data stream. Most, if not all, Elasticsearch bucket aggregations do not offer features that cannot be achieved with either MySQL or MongoDB. The Filter, Filters and Global bucket aggreations are tricky to compare with the other systems. As explained above Elasticsearch differs from MySQL (and MongoDB) by returning search results and aggregation results together in one result set as a reply to a single query. For example, one can search all products that have the word Beach in their title line and gather distribution statistics on some product attributes like size and material. The Elasticsearch query is:

{
  "query": {
    "match": {
      "title": "Beach"
    }
  },
  "aggs": {
    "by_size": {
      "terms": {
        "field": "size"
      }
    },
    "by_material": {
      "terms": {
        "field": "material"
      }
    }
  }
}

nixnutz@linux-ouxx:~/src/php-src> curl -XGET '127.0.0.1:9200/products/_search?pretty' -d@aggs.json
{
...
    "hits": [   < --- all matching products
      {
        "_index": "products",
        "_type": "product",
        "_id": "86141",
        "_score": 3.9986782,
        "_source": {
          "size": "L",
          "price": 1.24,
          "title": "L {ukqevpzmht jyorthop Beach",
          "usage": "baby",
          "gender": "female",
          "material": "cotton",
          "description": "",
          "in_stock_since": "2016-08-01 20:08:40"
        }
      }, 
  "aggregations": {
    "by_size": { < -- size distribution statistics
...
        {
          "key": "L",
          "doc_count": 194
        },
...
    "by_material": { <--- material distribution statistics
...


A SQL query that comes somewhat close to it is made of three individial queries glued together with UNION ALL (see above). The Filter bucket aggregation can be used to filter out values from one of the aggregations. Maybe, we only need to count how many of matching products are available in size “M”:

{
  "query": {
    "match": {
      "title": "Beach"
    }
  },
  "aggs": {
    "by_size": {
      "filter": {        <-- filter bucket
        "term": {
          "size": "M"
        }
      }
    },
...


In SQL the filter condition would be added to the WHERE clause of the corresponding query in the UNION ALL statement.

SELECT ... FROM products WHERE MATCH(title) AGAINST('Beach')
  UNION ALL
SELECT ... FROM products WHERE size = 'M' AND ... < -- filter bucket condition 
  UNION ALL
SELECT ... FROM products GROUP BY material WHERE ...

n/a (yes) – apples and oranges…

Albeit the Filter, Filters and Global aggregations can be emulated in SQL, I marked them as “n/a (yes)” in the feature comparison table. In the end, its comparing apples and oranges. The the wonderful, unique Elasticsearch feature of returning aggregation values together with search results has no direct counterpart in the SQL world. The Elasticsearch global aggregation is specifically tailored to this different way of operation. It breaks and aggregation out of the current search context and allows it to perform an aggregation on the entire data set. The below query calculated the average price of those products matching “Beach” in their title and the average price of all products:

{
  "query": {
    "match": {
      "title": "Beach"  <-- search products matching "Beach"
    }
  },
  "aggs": {
    "avg_price_beach": {
      "avg": {
        "field": "price" <-- average price for "Beach" products
      }
    },
    "avg_price_all": {
      "global": {
        
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price" < average price for all products
          }
        }
      }
    }
  }
}


The SQL equivalent is:

SELECT 
  COUNT(*) AS _hits, AVG(price) AS _avg_beach, 
  (SELECT AVG(price) FROM products) AS _avg_all <-- break out search context
FROM products 
  WHERE MATCH(title) AGAINST ("Beach");

Aggregation Elasticsearch MySQL MongoDB
Childen Yes for future blog post
Date Histogram Yes Complex Complex
Date Range Yes Complex Complex
Filter Yes n/a (yes) n/a (yes)
Filters Yes n/a (yes) n/a (yes)
Geo Distance Yes for future blog post
GeoHash grid Yes for future blog post
Global Yes n/a (yes) n/a (yes)
Histogram Yes Complex Complex
IPv4 Range Yes Complex Complex
Missing Yes Yes Yes
Nested Yes for future blog post
Range Yes Complex Complex
Reverse Nested Yes for future blog post
Sampler Yes Complex Yes
Significant Terms Yes No Unknown (no)
Terms Yes Yes Yes

Histogram and Range aggregations

The group of histogram and range aggregation in Elasticsearch could be described as convenience functionality from a SQL database vendor perspective. Take the histogram aggregation as an example:

{
  "aggs": {
    "prices": {
      "histogram": {
        "field": "price",
        "interval": 10
      }
    }
  }
}


The SQL equivalent is:

SELECT
  COUNT(*) AS _num, 
  IF(
    price % 10 < 0, 
    price - (10 + (price % 10)), 
    price - price % 10
  ) AS _bucket
FROM 
  products 
GROUP BY _bucket

+----------+---------+
| _num     | _bucket |
+----------+---------+
|   168697 |       0 |
|   243999 |      10 |
|   288138 |      20 |
|   116725 |      30 |
|    77041 |      40 |
|    50007 |      50 |
|    34483 |      60 |
|    16487 |      70 |
|     3866 |      80 |
|      557 |      90 |
+----------+---------+


Unquestionable SQL can deliver. Unquestionable using IF is no rocket science. And, nobody should pollute an API with convenience functions. But there is more
than convenience here. MySQL evaluates the IF for every row: it’s slow. How slow? That’s for the next blog posting.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Faceted search, why the DevAPI could matter one day

2016/07/29 - by admin - 0 comments

Faceted search or faceted navigation is a highly praised and widely use search pattern. And, it is a great reply to an off the records sales engineering question. MySQL finally has some document store features built-in. A bit of a yawn in 2016. There is a new X DevAPI available with some Connectors. A bit of a yawn technically. But it is a non-technical change of mind: developer centric counts! Sales, all, technical value could show at non-trivial developer tasks, like faceted search.

Todays X DevAPI does not get you very far

There are great stories to tell about the X DevAPI, see MySQL 5.7.12 – Part 3: More Than “Just” SQL?:

  • Non-blocking API
  • CRUD API for schemaless documents in collections and schemaful rows in tables
  • SQL support
  • Prepared for “we need new APIs” for distributed databases

Most importantly the “look and feel” is similar to comparable offerings from NoSQL vendors. Competitive offerings have been described as easy, natural, developer friendly. We tried to follow these maximes. Albeit a leap step forward the feature set of the first version of the X DevAPI is limited. It works for basic apps like the demo app. But instead of adding a bulk of additional features we broaden our view to achieve the goal of improving developer ease of use on a whole:

Selling point readability

The off the records sales engineers question was: why would one use the X DevAPI, given that it is only a dump SQL wrapper? Let a developer write some code to fetch all products from a collection that have a price higher than 1.2 Euros:

products.find("price > 1.2").exeucte();


Upon execution, the Connectors (drivers) send the query to the X server plugin which translates it to some SQL and executes it. The SQL statement makes use of ANSI/ISO SQL standard JSON features which MySQL 5.7.12+ supports:

SELECT product FROM products WHERE
   JSON_UNQUOTE(JSON_EXTRACT(product, "$.price")) > 1.2;


There is no technical value in this. Schemaless? Works with SQL. JSON processing? Works with SQL, too. The X DevAPI selling point is readability. But none of the available features today and none of feature on the short term roadmap has the potential to add more value to the X DevAPI.

Faceted search: let daddy sew a coat

Faceted search supports exploring large amounts of data by displaying summaries about various partitions of the data and later allowing to narrow the navigation to a specific partition. Let there be a father that wants to sew a rain coat exactly as the one below for his one year old daughter.

Self-made baby rain coat

Daddy opens his web browser in the middle of the night after having completed all baby care and feeding duties and starts to search for fabric in an online retailer. But what to search for, how to quicky find the best fabric out of 100.000 offered by the online store? Many stores will display a search box and a faceted navigation at the left side. Possible facets (partitions, dimensions) are:

  • Material
    • Blends (16647)
    • Cotton (16762)
  • Usage
    • Baby (19913)
    • Dress (20005)
  • Price
    • Less than 10 Euros (13815)
    • 10-20 Euros (16207)

A faceted document search using SQL

Assume the products are stored in the database using (JSON) documents. All documents are in JSON column called “product” of a table called “products”. How to get the totals for all the facets using SQL?

{
  "title": "Beach motive M182-16"
  "description": ...  
  "weight": ...
  "material": "cotton"
  ...
}


The SQL for one facet is pretty much straight-forward:

SELECT 
  JSON_UNQUOTE(JSON_EXTRACT(product, "$.material")) AS facet_value, 
  COUNT(*) AS facet_count 
FROM products 
WHERE JSON_EXTRACT(product, "$.material") IS NOT NULL 
GROUP BY JSON_EXTRACT(product, "$.material")


To get the values for more than one facet using no more than one query call in your application, combine the results for each facet using UNION ALL. Add a column “facet” to mark the orgininating facet in the combined row result.

  SELECT 
    "material" AS facet, 
    JSON_UNQUOTE(JSON_EXTRACT(product, "$.material")) AS facet_value, 
    COUNT(*) AS facet_count 
  FROM products 
  WHERE JSON_EXTRACT(product, "$.material") IS NOT NULL 
  GROUP BY JSON_EXTRACT(product, "$.material") 
UNION ALL 
  SELECT 
   "usage" AS facet, 
   JSON_UNQUOTE(JSON_EXTRACT(product, "$.usage")) AS facet_value, 
   COUNT(*) AS facet_count 
  FROM products 
  WHERE JSON_EXTRACT(product, "$.usage") IS NOT NULL 
  GROUP BY JSON_EXTRACT(product, "$.usage")

Speeding it up and shortening the SQL

There’s quite a bit to come and the length of the SQL statement will grow. Let’s shorten it a bit by adding generated columns (5.7.6) and speed things up using indicies. Long story short, here’s the CREATE TABLE statement I’m using for this blog post:

CREATE TABLE `products` (
  `product_id` int(11) NOT NULL AUTO_INCREMENT,
  `product` json DEFAULT NULL,
  `facet_size` varchar(255) GENERATED ALWAYS AS (json_unquote(json_extract(`product`,'$.size'))) VIRTUAL,
  `facet_material` varchar(255) GENERATED ALWAYS AS (json_unquote(json_extract(`product`,'$.material'))) VIRTUAL,
  `facet_usage` varchar(255) GENERATED ALWAYS AS (json_unquote(json_extract(`product`,'$.usage'))) VIRTUAL,
  `facet_gender` varchar(255) GENERATED ALWAYS AS (json_unquote(json_extract(`product`,'$.gender'))) VIRTUAL,
  `facet_price` double GENERATED ALWAYS AS (json_unquote(json_extract(`product`,'$.price'))) VIRTUAL,
  PRIMARY KEY (`product_id`),
  KEY `idx_facet_size` (`facet_size`),
  KEY `idx_facet_material` (`facet_material`),
  KEY `idx_facet_usage` (`facet_usage`),
  KEY `idx_facet_gender` (`facet_gender`),
  KEY `idx_face_price` (`facet_price`)
)

Using the above the query to gather one facets totals is shortened to:

SELECT 
  facet_material AS facet_value, 
  count(*) AS facet_count 
FROM products 
WHERE facet_material IS NOT NULL 
GROUP BY facet_material

A range based face: price

The price facet in the fabric online shop is range based. In Germany, the home of the example daddy, fabrics are sold per meter. The typical price of a fabric ranges from a few Euros up to 100 Euros with the majority topping around 40 Euros. The show owner wants to display totals for prices from 0..10, 10..20, 20..50, 50+ Euros. Note that the ranges are of different size.

Of course, SQL can deliver! For example, use a subquery and CASE expressions. Stackoverflow will guide a developer within seconds.

SELECT 
  "price" AS facet, 
  fpsub.facet_price_range AS facet_value, 
  COUNT(*) AS facet_count 
FROM 
  (SELECT 
    (CASE 
      WHEN facet_price BETWEEN 0 AND 10 THEN "0-10"
      WHEN facet_price BETWEEN 10.01 AND 20 THEN "10-20" 
      WHEN facet_price BETWEEN 20.01 AND 50 THEN "20-50" 
      ELSE "50+" 
    END) AS facet_price_range 
  FROM products 
  WHERE facet_price IS NOT NULL) AS fpsub
GROUP BY fpsub.facet_price_range


Combine all the individual facet queries using UNION ALL and you have the basic search.

Refinement: user selects a facet value

Example Daddy made up his mind on the material for the inner jacket, selects “cotton” and enters “Beach” into the search box. Likely, the users expectation is that the selection will have no impact on the totals shown for any other material but cotton. In other words the material face query needs to be split in two: one to count the total for “cotton” and “Beach” and another one for the totals of all the other facet values. Note that I ignore the problem of full text search and use LIKE exactly how one should not use it.

  SELECT 
    facet_material, 
    count(*) 
  FROM products 
  WHERE 
    facet_material = "cotton" AND 
    JSON_UNQUOTE(JSON_EXTRACT(product, '$.title')) LIKE "%Beach%"  
  GROUP BY facet_material 
UNION ALL 
  SELECT 
    facet_material, 
    COUNT(*) 
  FROM products 
  WHERE 
   facet_material != "cotton" AND
   facet_material IS NOT NULL   
  GROUP BY facet_material

The grand total

Proof: you can use SQL for the job.

SELECT 
    "material" AS facet,
    facet_material AS facet_value, 
    COUNT(*) AS facet_count
  FROM products 
  WHERE 
    facet_material = "cotton" AND 
    JSON_UNQUOTE(JSON_EXTRACT(product, '$.title')) LIKE "%Beach%"  
  GROUP BY facet_material 
UNION ALL 
  SELECT
    "material" AS facet,
    facet_material AS facet_value, 
    COUNT(*) AS facet_count
  FROM products 
  WHERE 
   facet_material IS NOT NULL AND
   facet_material != "cotton" 
  GROUP BY facet_material
UNION ALL
  SELECT 
    "price" AS facet, 
    fpsub.facet_price_range AS facet_value, 
    COUNT(*) AS facet_count 
  FROM 
    (SELECT 
      (CASE 
        WHEN facet_price BETWEEN 0 AND 10 THEN "0-10"
        WHEN facet_price BETWEEN 10.01 AND 20 THEN "10-20" 
        WHEN facet_price BETWEEN 20.01 AND 50 THEN "20-50" 
        ELSE "50+" 
      END) AS facet_price_range 
    FROM products
    WHERE facet_price IS NOT NULL) AS fpsub
  GROUP BY fpsub.facet_price_range
UNION ALL
  SELECT
   "size" AS facet,
   facet_size AS facet_value,
   COUNT(*) AS facet_count
  FROM products
  WHERE
    facet_size IS NOT NULL
  GROUP BY
    facet_size
UNION ALL
  SELECT 
    "usage" AS facet,
    facet_usage AS facet_value,
    COUNT(*) AS facet_count
  FROM
    products
  WHERE
    facet_usage IS NOT NULL
  GROUP BY 
    facet_usage
UNION ALL
  SELECT    
    "gender" AS facet,
    facet_gender AS facet_value,
    COUNT(*) AS facet_count
  FROM 
    products
  WHERE
    facet_gender IS NOT NULL
  GROUP BY
    facet_gender
ORDER BY 
 facet, 
 facet_value

+----------+------------------+-------------+
| facet    | facet_value      | facet_count |
+----------+------------------+-------------+
| gender   | female           |       33387 |
| gender   | male             |       33327 |
| material | blends           |       16647 |
| material | designer fabrics |       16703 |
| material | knits            |       16739 |
| material | silk             |       16594 |
| material | wool             |       16555 |
| price    | 0-10             |       13815 |
| price    | 10-20            |       16207 |
| price    | 20-50            |       55668 |
| price    | 50+              |       14310 |
| size     | 114cm            |       14464 |
| size     | 140cm            |       14366 |
| size     | L                |       14009 |
| size     | M                |       14303 |
| size     | S                |       14327 |
| size     | XL               |       14211 |
| size     | XXL              |       14320 |
| usage    | baby             |       19913 |
| usage    | dress            |       20005 |
| usage    | inside           |       19929 |
| usage    | outdoor          |       20166 |
+----------+------------------+-------------+
22 rows in set (0,34 sec)


The execution time of 0.34s? Well, this is a source build of MySQL running in a VM on a notebook. No server parameters set. Inacceptable result for interactive search experience.

How the DevAPI could make a difference

A DevAPI counterpart of the above could read very different.

products
  .find("title LIKE :search")
  .facets({
   "usage", 
   "material", 
   "gender", 
   "size",
   "price" {
     "0-10": {"min": 0, "max": 10"}
     ...
   }
  .bind("search", "%Beach%")


As a developer, I could immediately grasp what the higher level DevAPI does. I would have no clue what the SQL is about if the column aliases would not hint me. Note also that the DevAPI and the X plugin are not limited to SQL. They could, if they want, use the lower level MySQL internal data access APIs. The internal APIs can be faster than SQL, way faster. SQL runs on top of them.

Dear Sales, All, the DevAPI is only a tiny bit of the MySQL document store story. You may see it as a dump SQL wrapper of little value. Or, as a chance to offer an alternative query API that solves web developer problems much easier than raw SQL could do.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PECL/mysqlnd_ms needs updates for MySQL Group Replication

2014/10/30 - by admin - 0 comments

‘Synchronous’, multi-master, auto-everything – that’s the new MySQL Group Replication (IPC14 talk/slides) in simple words. After torturing PHP developers for decades with MySQL Replication there is now a new replication option which does not require read-write splitting. A system that does not know about slave lags and reading stale data. In theory, MySQL Group Replication is just about the perfect approach to run a standard PHP application (WordPress, Drupal, …) on a small cluster (3-7 nodes) in LAN settings. In theory, MySQL Group Replication improves both availability and performance.

Distribution Transparency

When designing replication systems there are some desireable goals which contradict each other. In a perfect world, from a developers perspective, a database cluster would behave exactly the same way as a single database. The user should never have to worry where and how data is stored in the cluster. Transactions executed on the cluster would provide the same properties like transactions run on a standalone database. The cluster would never return stale data (synchronous).

Synchronous replication is desired but it requires coordination among cluster nodes. In LAN settings coordination can be reasonably fast. MySQL Group Replication is ‘synchronous’ replication (see slides for details). Deploy it on LAN only. In the internet, in WAN settings, when trying to replicate from Europe to Asia things will be slow. If WAN, then either forget about distribution transparency or performance. If WAN, go for asychronous MySQL Replication.

The extra work different clusters cause for the developer

Synchronous and asynchronous clusters always cause some extra work for the developer. Either approach requires load balancing and failover logic. An asynchronous approach adds: dealing with delays and stale reads. MySQL Replication is not only asynchronous but has only one master (primary). This adds: read-write splitting.

PECL/mysqlnd_ms tries to help with all these tasks and take them over in a semi-transparent way.

PECL/mysqlnd_ms support for synchronous clusters

PECL/mysqlnd_ms is a plugin for mysqlnd. PDO_MySQL and mysqli use mysqlnd as their default library to talk to MySQL. Any of the two APIs works with PECL/mysqlnd_ms, our load balancing and replication plugin.

The plugin monitors many API calls and aims to make using any kind of MySQL clusters easier. No matter what cluster: MySQL Replication, MySQL Cluster, MySQL Group Replication, 3rd party solutions. Example configurations are given in the PHP manual.

MySQL Group Replication usage task 1: load balancing

When moving an application from a single database server to a synchronous cluster there are two additional tasks: load balancing and failover. With PECL/mysqlnd_ms load balancing does no require any code changes. The plugin intercepts your connect calls and tests whether the host you connect to matches the name of a config entry. If so, the plugin loads the config, learns from the config which nodes there are and starts load balancing connection. Should you be too lazy to change the host name in your connects to match a PECL/mysqlnd_ms config entry, then just name the config entry after you current host names, have a config entry for ‘127.0.0.1’ etc.

$link = new mysqli("myapp", ...);

MySQL Group Replication usage task 2: failover

The second task is to handle the failure of a cluster node and connect to the next available one. PECL/mysqlnd_ms does that for you if you want. It picks an alternative from the config and connect you to it.

There’s a small feature gap here. MySQL Group Replication tries to be an auto-everything solution. It automatically detects failed nodes. It also fully automates adding new nodes to the cluster. That’s cool but it means that over time the set of nodes can change and your config needs to be updated.

The PECL/mysqlnd_ms feature gap

There are two options. First, you could deploy the config. Second, after a failover or periodically, we could make PECL/mysqlnd_ms fetch the list of nodes from the cluster and make it reconfigure itself (see also here). That’s finally possible because MySQL Group Replication shows the list of nodes in a performance schema table.

Once we did that, and MySQL Group Replication has reached GA, the auto-everything cluster for MySQL becomes real. All the stuff on the server side is already automatic. PECL/mysqlnd_ms is already GA and already handles all additional tasks – without code changes. A tiny addition is missing and you could even get an auto-deployed PECL/mysqlnd_ms…

Happy hacking!

@Ulf_Wendel Follow me on Twitter

Is the HTTP Plugin for MySQL secure?

2014/10/02 - by admin - 0 comments

The HTTP Plugin for MySQL offers three APIs: REST-like CRUD, REST-like JSON DOCUMENT and SQL. The SQL API lets you run any SQL you want. Including, for example, DROP mysql.users if you mess up your setup. Insecure? It depends on your viewpoint.

It’s more than just another protocol…

On the first look HTTP is just another network protocol for sending requests to MySQL. HTTP is the protocol of the web. Whether you need to integrate MySQL in a larger setup and use web services for data exchange or you want to access MySQL from a JavaScript client that is restricted to HTTP/Websocket. HTTP is the natural choice. CouchDB convinced many when it introduced the idea.

HTTP Client   Standard client
|   |
HTTP Protocol   MySQL C/S Protocol
|   |
MySQL

Standard clients use the properitary binary MySQL Client/Server Protocol to communicate with MySQL. The optional HTTP Plugin makes MySQL also listens to HTTP requests. It is you choice whether you want to make MySQL speak HTTP or not. HTTP is a clear-text protocol. Whether clear-text or binary makes no difference: security by obscurity does not work.

Both the MySQL Client/Server Protocol and the HTTP Protocol can be run over secure channels. This usally means SSL. SSL is available for both protocols, which means it’s 1:1 again. Let’s do a litte cheat-sheet:

  HTTP MySQL C/S Security Rating
Obscurity Clear-text Binary 0 : 0
Encryption Supports SSL Supports SSL 0 : 0

The transport layer

The MySQL Client/Server Protocol either runs over TCP/IP, Unix Domain Sockets or Windows named-pipes. Unix Doman Sockets are an inter-process communication method for processes on one machine. If you deploy the MySQL Server and its clients on the same machine, you can use this method of communication. It ensures that data exchanged never appears on the network. The data stays on the machine, which adds to the overall security.

Although you may use TCP/IP to connect from a client to a MySQL Server on the same machine and the data should stay on the machine, this is no common deployment. Just by using the TCP/IP stack instead of a local IPC method, you loose a tiny bit of security. The TCP/IP stack could send data to the network, the local IPC can’t. It has no means to do so.

I assume most deploy MySQL and its clients on different machines. Then TCP/IP is used. TCP/IP is the only choice for the HTTP Plugin. Although you can use SSL to secure the connection, it just a little worse than MySQL C/S:

  HTTP MySQL C/S Security Rating
Transport TCP/IP TCP/IP, Unix Domain Socket, Windows pipe 0 : 0.5

The access methods: SQL and NoSQL

Todays MySQL speaks three protocols: HTTP, MySQL Client/Server Protocol and Memcache (InnoDB Memcache Plugin, MySQL Cluster support).

HTTP Client   Standard client   Memcache client
|   |   |
HTTP Protocol   MySQL C/S Protocol   Memcache Protocol
|   |   |
MySQL

This is done to offer SQL and Not-Only-SQL access methods. The Not-Only-SQL access methods bypass the SQL processing layers in MySQL. A decade ago already, MySQL Connectors folks estimated that SQL parsing can take 50% of the total execution time of a simple query. Yet, it required innovative MySQL expert users and the NoSQL movement until the popular Memcache protocol creeped into MySQL and killed the 50% overhead. In addition to SQL commands, MySQL also accepts key-value style commands. The latter use lower-level APIs inside the server. Thus, they are faster. The Memcache key-value style APIs don’t know about SQL injection or the like – no SQL used…

The HTTP Plugin development version we show internally maps all HTTP requests to SQL. It has three APIs. REST-like CRUD and REST-like DOCUMENT perfectly qualify for taking the lower-level API route. Whether we will do that is undecided. If so, SQL injection or the like does not matter – no SQL used…

Currently, in the development version, we don’t use the lower-level APIs. That’s no secret, we released the source. In the source one will find the use of SQL and escaping. We could have, had we bothered, used prepared statements. The initial HTTP Plugin development version works exactly like trivial proxy written in a language like PHP, Python or the like. As much as such a proxy can be secured, the current code with its all SQL mapping could be secured.

I won’t put any rating entry on the cheat-sheet as this is an implementation detail of a development version.

The user authentication

Things are now down to one third of the HTTP Plugin – the SQL endpoint – versus a standard MySQL Client.

Both a standard MySQL client and a HTTP Plugin client execute SQL as a certain MySQL user. The actions of the client are restricted by the permissions granted. If you allow your client to run DELETE FROM mysql.user you may find yourself in an uncomfortable situation soon.

If SSL is used, it is valid to accept MySQL user credentials from the HTTP client and log the client in as the MySQL user given. This clear-text user credential exhange over SSL is also done with the MySQL Client/Server Protocol. It is implementation details of the various pluggable authentication methods MySQL offers that restrict HTTP clients to login as a MySQL user of their choice when SSL is used only. For example, some of the authentication methods require a handshake procedure.

  HTTP MySQL C/S Security Rating
MySQL user password security Clear-text over SSL Clear-text over SSL and other methods 0 : 0

Without SSL, the HTTP client defaults to a preconfigured user. We used HTTP Basic Authentication, which is a rather undesireable HTTP authentication method, as a simple and immediately to understand method of getting user credentials from a HTTP client. Of course, there are better one!

Other HTTP authentication methods are also more complicated and distract. Distract from the SSL explanation. Distract from the option that one could and should see the authentication process as two-staged. The first step is the login towards the HTTP Plugin, then a second step maps the HTTP user to a MySQL user. Wouldn’t it be nice if for non-SSL connections HTTP user ‘Ulf’ could be mapped to MySQL user ‘evil’ and HTTP user ‘Johannes’ could be mapped to MySQL ‘caretaker’? Here, the MySQL users password would not be taken from the HTTP world, it would be taken from some plugin config.

First stage (network)
HTTP user -> HTTP Authentication method (Basic Auth, OAuth, …)
Second stage (inside the Plugin)
HTTP user -> Mapping to MySQL user

No rating. I find it hard to compare with MySQL where you always have the password in clear-text at the client.

What people missed, …

The user authentication and mapping alone is still not enough. See slide 100! Slide 100 is what comes after the safe harbour statement. In other words this is pure speculation and idea spreading. No promise we will ever make this happen.

Here’s the illustration from the slide, slightly simplified (it’s 2am – time for shortcuts ;-)):

HTTP Client
|
Improved security, more freedom
|
HTTP Plugin
app/ v8 JavaScript sql/, crud/, docs/ – built-in

The MySQL permission system alone is too coarse to fully match the capabilities of any of todays self-baken HTTP-to-MySQL proxies. You can limit a MySQL user to be allowed to read from a single table only. But, the person can read all rows from the table. You could try to use a view but probably, you would just implement some filtering in your proxy. “Just implement some filtering” becomes a problem when the HTTP Plugin is your proxy. There’s no script language, you can’t do that. Unless, there was a script language built-in to the proxy…

  HTTP MySQL C/S Security Rating
Application based filtering Implementation detail not bound to protocol Implementation detail not bound to protocol 0 : 0.5 (see text)

All in all, a HTTP Plugin can get pretty close to todays solutions. It’s not there yet.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

MySQL 5.7 – HTTP Plugin for MySQL

2014/09/27 - by admin - 13 Comments

It is official: MySQL listens to HTTP and speaks JSON. MySQL got a new plugin that lets HTTP clients and JavaScript users connect to MySQL using HTTP. The development preview brings three APIs: key-document for nested JSON documents, CRUD for JSON mapped SQL tables and plain SQL with JSON replies. More so: MySQL 5.7.4 has SQL functions for modifying JSON, for searching documents and new indexing methods! The download and documentation (also here) is on http://labs.mysql.com/, the slides are below:

What a buzzword bingo! The HTTP Plugin is just a HTTP proxy. One that you can load into MySQL, if you want, to avoid having to write your own one. It simplifies the setup, it is a convenience feature that you should consider if you don’t need the power of scaling the database and the web server independently.

Without the HTTP Plugin   With the HTTP Plugin
HTTP Client   HTTP Client
|   |
Web Service (e.g. PHP script)   |
Web server   |
|   HTTP Plugin
MySQL   MySQL

Did we forget about the developer?

The HTTP Plugin just makes MySQL a bit more web developer friendly. Hello frontend developers, we heard about you ;-). MySQL was born in a world without the web. MySQL grew up with backend developers. MySQL somewhat missed the growing role of frontend development, the developers and their ideas. NoSQL happily listened: be it in terms of on built-in JavaScript, the choice of JSON as a serialization format for document stores, asynchronous APIs or just HTTP interfaces.

The three APIs

There are three APIs: plain SQL over HTTP, CRUD and DOCUMENT. All three return JSON. Technically, JSON is just a different data serialization format for MySQL. If you are a frontend JavaScript developer and life in a world of HTTP and JavaScript, you likely prefer that over some binary format.

To be totally frank: at this point the integration of JSON into the APIs is only so-so. That is on purpose. None of the APIs is finalized. We translated some “what if” questions into code so that one has something play with. Now we fish for input.

Let’s get the fishing started, let’s cause some laughter. Here’s the SQL endpoint executing SELECT 1:

shell> curl ... --url "http://127.0.0.1:8080/sql/db/SELECT+1"
[
{
"meta":[
  {"type":8,"catalog":"def",
  "database":"","table":"",
  "org_table":"","column":"1","org_column":"",
  "charset":63,"length":1,
  "flags":129,"decimals":0}
],
"data":[ 
        ["1"]
],
"status":[{"server_status":2,"warning_count":0}]
}
]

The SQL endpoint replies 1:1 what a standard MySQL client gets a MySQL Connector may hide from the developers eyes. This is all information the MySQL Client/Server Protocol has to offer. We included everything to see if there is anybody who wants to write its own “driver” atop of it. If so, that self-made driver can expose similar features.

Ignoring the chatty reply, the SQL endpoint is needed for rich queries. The CRUD and DOCUMENT endpoint currently support key-document semantics only.

The CRUD endpoint a single row from a MySQL table identified by its primary key value. The row is mapped to JSON in the most simple way. No meta data is included: a very lightweight reply.

shell> curl ...  --url "http://127.0.0.1:8080/crud/db/simple/1"
{"id":"1","col_a":"Ahoy"}

The SQL endpoint supports HTTP GET requests only. CRUD and DOCUMENT endpoints accept GET, PUT and DELETE requests. A better mapping is much desired. To be frank, once more, we had the choice between writing documentation and attempting to look a bit more REST-like. You know REST, what’s there to demo?

The DOCUMENT endpoint takes any valid JSON and stores it. The access pattern is key-document:

shell> # curl -i -X PUT --user basic_auth_user:basic_auth_passwd --url "http://127.0.0.1:8080/doc/db/another_table"
HTTP/1.1 201 Created
...
{"info": "Table created"}
shell> # curl -i -X PUT -d '{"words": ["Hello", "world"]}' --user basic_auth_user:basic_auth_passwd --url "http://127.0.0.1:8080/doc/db/another_table/key"
HTTP/1.1 200 OK
...
{"info": "Document added"}
shell> # curl -X DELETE -i --user basic_auth_user:basic_auth_passwd --url "http://127.0.0.1:8080/doc/db/another_table/"
HTTP/1.1 200 OK
...
{"info": "Table dropped"}

That’s inside

The HTTP Plugin for MySQL works exactly like any of todays client-side proxies/web services. A multi-threaded web server library featuring asynchronous I/O accepts the web request. Then, the plugin translates it to SQL statements to compute an answer. A server plugin alone does not make MySQL NoSQL or a document store over night.

MySQL 5.7.4 brings some improvements that help with the proxy job: updated SQL JSON functions, virtual columns and stay tuned on that [functional indexes] topic!.

Acting exactly like any of todays client-side proxies and storing JSON documents in plain BLOB columns has the advantage of acting fast. Improving the HTTP API alone takes about as much effort as it takes improving a web service! The MySQL Connectors team can act fast and independent of any other team on this.

Starting simple with the CRUD and DOCUMENT endpoint, offering key-document semantics only leaves room for future optimization. There are faster ways than mapping HTTP requests to SQL. For MySQL 5.7 the InnoDB team claims 1.100.000 QPS through the memcache protocol and 625.000 QPS through SQL. But optimizing for speed would be a premature optimization. After all, we don’t have finalized APIs yet.

The other stories

Since 2009, since the time when the equation NoSQL = Not Only SQL appeared, MySQL expert users have attempted to create key-value access methods for MySQL. This ultimately resulted in the development of the Memcache interfaces for MySQL. There is a dual access method for data inside MySQL: fast key/value and rich SQL.

PHP, … any standard client
Memcache driver MySQL Connector
| |
Fast Key/Value Rich SQL
| |
InnoDB Memcache Plugin Standard C/S interface
MySQL

This story has one disadvantage: clients need two drivers. The protocol changes, two libraries are needed that implemenent the network protocol and data serialization. Assuming the SQL CRUD and DOCUMENT endpoints would be optimized for performance, clients could choose between fast access and rich query language by just changing the URL.

How about SQL over Protocol X? The SQL endpoint has given you a quick impression how much information the standard MYSQL Client/Server protocol must carry to handle all possible SQL and every possible client. If you need only a subset of the SQL features offered by MySQL, you could try to write a plugin that has a leaner protocol. There are major users of MySQL that restrict themselves to certain SQL features. Some even developed their own programming languages. To get back to the ground: how about Websocket for JavaScript users?

HTTP Client HTTP Client Your Client
| | |
HTTP (JSON) Websocket (JSON) Your protocol (your serialization)
| | |
HTTP Plugin Websocket Plugin Your Plugin
MySQL

These and many more stories, some of which I hinted towards the end of the presentation, become possible if MySQL had just two plugin services:

HTTP Client Your Client
| |
HTTP (JSON) Your protocol (your serialization)
| |
HTTP Plugin Your Plugin
Plugin SQL Execution Service Plugin Authentication Service
MySQL

But that’s another story for another audience. And, no, the development preview release only contains some proof-of-concept hacks here, no proper plugin services…

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PECL/mysqlnd_ms: summer time, (connection) pool time – Fabric support internals

2014/08/07 - by admin - 0 comments

The MySQL replication and load balancing plugin for PHP, PECL/mysqlnd_ms, aims to make using a cluster of MySQL servers instead of a single server as transparent as possible. Should you ever want to migrate your PHP app from a single MySQL server to any kind of MySQL cluster, install PECL/mysqlnd_ms and see how much breaks. A simple app, might need no code changes at all. Your APIs connect call remains unchanged, your query execution calls remain the same: one connection handle transparently switched to an appropriate cluster node (read-write splitting, weighted load balancing, …).

/* Or, use PDO_MySQL... One user handle for an entire cluster */
$link = new mysqli("mycluster", ...);  
/* Goes automatically to a slave, if read-write splitting is enabled */
$link->query("SELECT ...");
/* All actual connections to master or slave will use UTF8 from now on */
$link->set_charset("utf8");
/* With read-write splitting on, goes to master */
$link->query("INSERT ...");

Tell PECL/mysqlnd_ms that connect("mycluster", ...) means you want to connect to a cluster “mycluster” which has, for example, is a MySQL Replication cluster with master A and slaves B, C. Call $link->select_db("mydb"), $link->autocommit(false), $link->set_charset("utf8"), … and no matter to which node (A, B, C) you actually get automatically connected, the connection has those settings! Unless, you are using the sharding feature of our current, still very limited MySQL Fabric support… A new internal connection pool shall cure this and more. A tour through PECL/mysqlnd_ms internals to follow.

PECL/mysqlnd_ms example configuration for a simple MySQL Replication setup

{
    "myapp": {
        "master": {
            "master_A": {
                "host": "localhost",
                "socket": "/tmp/mysql.sock"
            }
        },
        "slave": {
            "slave_B": {
                "host": "192.168.2.27",
                "port": "3306"
            },
            "slave_C": {
                "host": "192.168.2.28",
                "port": "3306"
            }
        }
    }
}

How a single user handle stands for many actual connections

No matter whether you use PECL/mysqlnd_ms for a simple MySQL Replication cluster or a farm of MySQL servers that shard data one or the other way, it always confronts you with a massive change: a single user conection handle is mapped internally to many actual connections.

PHP script (*.phpt) Inside PHP (*.c)

$link = 
  mysqli_connect("mycluster", ...);

Object representing $link
List of masters Actual connection to master_A
List of slaves Actual connection to slave_B
Actual connection to slave_C
List of filters Load Balancer: Random

Furthermore, to keep the number of actual connections low, the PECL/mysqlnd_ms plugin for mysqlnd, will – by default – use lazy connections. A lazy connection to a node is not established before a query is to be run on the node. Assume you do $link = mysqli_connect("mycluster"); $link->set_charset("UTF8");. What happens inside PECL/mysqlnd_ms is very different from the normal case. Normally, without the plugin, $link would be one actual, established connection to a MySQL server and the server would immediately reply to your set_charset() command.

PHP script (*.phpt) Inside PHP (*.c)

$link->set_charset("utf8")

Loop over all actual master and slaves connections:

  • If lazy: recall setting for later connect()
  • If established: execute on server

With the plugin, we take the command and dispatch it to all masters and slaves. At this point the internal master and slave lists may hold already established or lazy connections. If its an established connection, the command is executed on the server. If its a lazy connection, the setting is remembered but the command execution itself is delayed until the connection is established. This is what I like to call semi-transparent. However, because we delay command execution in case of lazy connections, errors may also be delayed until the connection is established – be aware.

Still, this is about as transparent and comfy as things can get, if there is such a thing as a connection state (note that I’m ignoring the SQL side). An alternative had been to disable features and strip down connections to something without much of a state, however, different story…

MySQL Fabric: the auto-everything clustering vision

Probably a decade after we should have had a solution ready, PECL/mysqlnd_ms was ready. But MySQL Replication was lacking badly behind any of the better NoSQL clustering solutions. Automatic master failover? Considered a user task. Have clients discover the new master automatically? Unthinkable! Sharding for MySQL Replication? Considered a user task. MySQL Proxy? Still alpha. Sure, after all the years their had been third-party solutions for all this but the MySQL offering became less attractive.

MySQL Fabric is an administration tool for farms of MySQL servers. It is for the mass of the “all I need is a backup server…”-users that want nothing but automatic failover for MySQL Replication. And, it is for handful of “we host several thousand MySQL servers”-experts that go crazy and really need sharding.

MySQL Fabric – monitoring, administration -> Any MySQL Replication cluster
| Master (fails)
asks Fabric about nodes, e.g. master Slave Slave
|
^- Clients, e.g. PHP using PECL/mysqlnd_ms

The pure existance of an official administration tool is nice, but without a way for clients to ask about the current structure of the cluster it had been only yet another tool. The communication channel between clients and Fabric is what adds value. For example, should a master fail, Fabric promotes a new one – like any other tool. Then, the clients need to learn about the new master. This can finally be automated, be done out-of-the box: the clients simply ask Fabric for the new master.

The story is similar with sharding. You tell the driver (here: PECL/mysqlnd_ms) which shard key you are going to use, the driver asks Fabric for a list of appropriate shards. Usually, Fabric will reply with not only one shard but return a master and a couple of slaves because you want HA for your shards and use a MySQL Replication cluster to each and every shard. Then, PECL/mysqlnd_ms picks a shard for you.

$link = new mysqli("myfabric", ...);
mysqlnd_ms_fabric_select_shard($link, $table, $shard_key);
$link->query("SELECT ...");
$link->query("INSERT ...");

PECL/mysqlnd_ms matches all Fabric requirements but implementations clash

On the first look, this seemed peanuts to support. Johannes added the code for talking to Fabric and replaced the internal server lists. This has been our all idea. But, simple testing quickly revealed we had been wrong.

PHP script (*.phpt) Inside PHP (*.c)

mysqlnd_ms_fabric_select_shard(...);

Object representing $link
List of masters Load from Fabric
List of slaves Load from Fabric

When we have a sequence like $link->set_charset("utf8"); mysqlnd_ms_fabric_select_shard($link, ...), then the charset setting is forgotten. The charset was set on the actual connections. But, we have loaded a new server list from Fabric and with it we got new connections.

PHP script (*.phpt) Inside PHP (*.c)

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load master list from Fabric
  • Load slave list from Fabric

$link->set_charset("utf8");

  • Set all master connections to utf8
  • Set all slave connections to utf8

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load server lists from Fabric
  • utf8 charset setting lost

Furthermore, there was a problem with our load balancing filters. The load balancing filters (random, round-robin) support assigning a weight to a server. The higher the weight, the more requests we dispatch to the server at average. You can use this to optimize for proximity/latency or to assign load dependending on the size of your machines. For the weightes load balancing, the filters create cached information during startup phase based on the initial server lists. It is assumed that the server lists will not change until the script ends. However, Fabric does exactly that: replace server lists during runtime. Hence, this was broken too. The breakage is of little relevance as you would never combine the classic weighted load balancing with Fabric but still.

Last but not least, there is our new distriuted/XA transaction support. Imagine you need a transaction that spawns two shards. You start it on one shard and PECL/mysqlnd_ms does all the XA related SQL magic for you in the background. Then, you switch to another shard. All the connections get swapped out including the one on which you started the XA transaction. Upon swapping out the connections, they get closed and MySQL aborts the distributed transaction…

PHP script (*.phpt) Inside PHP (*.c)

mysqlnd_ms_xa_begin($link, ...);
$link->query(...);

  • Begin transaction on shard/server A

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load server lists from Fabric
  • Connection to A closed, transaction aborted

SOLVED, well, on the way to be

There is now an internal connection pool abstraction layer that solves all these problems. Connections are added to the pool and removed from it through API calls. There is some reference counting that will prevent a connection from being close while it is still needed for an ongoing distributed transaction.

The pool gets to know all the API commands that have been dispatched to connections to align their state. At the time of writing we monitor change_user(), select_db(), set_charset(), set_server_option(), set_client_option(), set_autocommit and ssl_set(). When you switch from one shard to another using mysqlnd_ms_fabric_select_shard(), the pool can replay the commands. The knowledge about the last charset choosen is no longer bound to the connections. The pool knows the charset and can align the state. When done, the pool can inform interested parties, such as a filter, about the new list of active connections.

PHP script (*.phpt) Inside PHP (*.c)

$link->set_charset("utf8");

  • Inform the connection pool charset=utf8
  • Set all connections to utf8

mysqlnd_ms_fabric_select_shard($link, ...);

  • Load server lists from Fabric
  • Tell the pool to replace active connection list
  • Keep connections ope which are still referenced, e.g. for XA transactions
  • Ask the pool to replay setting, e.g. charset=utf8
  • Tell all listeners, e.g. filter about new connection list

In theory, we should now be able to start supporting Fabric properly in PECL/mysqlnd_ms. But, some details are still open and not finished. For example, the order in which the pool replays state alignment commands when the connection lists are replaced, is somewhat random. This is likely to cause issues when settings depend on each other and order of execution. In the worst case, we may be forced to offer a user API call to allow setting an order and burdening the developer with this task.

Looking forward: reduce connection overhead

A positive side effect of having a pool abstraction is that we may be able to reduce connection overhead. Assume, your application is constantly switching between two shards A and B. We load the server lists for A and B exactly once from Fabric, then (in theory, yet) we cache them. When you query shard A, we open at least one connection. If the shard itself is a MySQL Replication cluster, we may have to open one connection for reading and one for writing. The next minute you switch to shard group B. So far, we have closed the connections to shard group A. Assume you now go back to shard group A. We close the connections to B and reopen some to A… what a waste.

Shard can be MySQL Replication clusters for HA
Shard group A   Shard group B
Master   Standalone/Master
Slave Slave  

With the pool abstraction in place, we put all the connections we ever open in a pool, when you first use a shard group. Consider the case again where your application switches from shard group A to B and back to A. When you switch from one shard to another, we don’t necessarily close the current connections. We mark them inactive. Then, we check whether we already have connections to the new shard open. If so, we mark them active and use them. Otherwise, we open new connections. In a sequence of switching from A to B and back to A, the last switch will – likely, depending on load balancing rules – boil down to marking already open connections as active.

New internal connection pool
All connection list
Active (= currently used for load balancing)
Shard group A server Master connection
Shard group A server Slave connection
Inactive (= currently not used)
Shard group B server Connection to B

No pool management/administration possible yet

The new connection pool does not break any existing functionality. No surprise, because the classic functionality of PECL/mysqlnd_ms does not really need and use the new pool. It is the Fabric support for which we need the pool. And, the Fabric support by PECL/mysqlnd_ms is not much tested. Until this is testing is done, the implementation has been completed and we trust the new pool code, you should consider it internal. Because it is internal, there is no way for you to manage or administrate the pool. Not yet.

There are some new statistics that let you monitor pool activity only.

Pool monitoring statistics
pool_masters_total Number of master servers (connections) in the internal connection pool.
pool_slaves_total Number of slave servers (connections) in the internal connection pool.
pool_masters_active Number of master servers (connections) from the internal connection pool which are currently used for picking a connection.
pool_slaves_active Number of slave servers (connections) from the internal connection pool which are currently used for picking a connection.
pool_updates How often the active connection list has been replaced and a new set of master and slave servers had been installed.
pool_master_reactivated How often a master connection has been reused after being
flushed from the active list.
pool_slave_reactivated How often a slave connection has been reused after being
flushed from the active list.

Happy hacking!

@Ulf_Wendel Follow me on Twitter

PECL/mysqlnd_ms: how failed XA transactions get fixed

2014/07/29 - by admin - 0 comments

XA transactions are an open standard for distributed transactions. A distributed or global transaction can spawn multiple databases. XA transactions come handy, for example, when data sets are sharded over multiple servers and a business transaction affects multiple shards. The free MySQL Fabric tools help to setup and manage a sharded MySQL cluster. The development version of PECL/mysqlnd_ms 1.6 helps with XA transactions. It abstracts SQL details and acts as a transaction manager. The PHP MySQL driver kicks in when things go wrong and XA transactions show their nasty side: blocked servers. Good news: this is a rare case. Bad news: a deep dive below.

Grant all Ulf’s a 10% discount (or none of them = use a transaction)
MySQL shard A: EMEA customers MySQL shard B: US customers MySQL shard C: APAC customers

/* Note: Fabric and XA don't go together - yet... it's coming! */
$link = 
new mysqli("mysql_sharding_cluster", "user", "password");

mysqlnd_ms_xa_begin($link, 1);

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "EMEA");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "US");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_fabric_select_shard($link, "shop.customer", "APAC");
$link->query("UPDATE discount = 10 WHERE first_name = 'Ulf');

mysqlnd_ms_xa_commit($link, 1);

PECL/mysqlnd_ms as a transaction manager

XA transactions use the two-phase commit protocol, which is a blocking protocol. Please, see also my previous blog post on the nature of the protocol and MySQL implementation limitation. If the client that drives the XA transaction, your PHP script, crashes at a certain point, some XA participants (MySQL servers) cannot make any progress. In the worst case, they end up waiting for a decision on the global transactions outcome endlessly. No, there is no timeout. As they wait, they block resources. That can be memory used for the transaction or some lock on some table.

Blocked during the second phase of the 2PC/XA protocol
PHP (coordinator) MySQL (participant) MySQL MySQL
–> Global commit  
Crash  
  Comitted Uncomitted: waiting for global commit or rollback

Any serious user of XA transactions will therefore have to implement some mechanism that ensures progress in case of crashes. After a crash, it must be possible to learn which participant is blocked, connect to the participant and tell it to either commit or roll back the open transaction. This housekeeping job is rather annoying, yet important. PECL/mysqlnd_ms can do it for you, it can act as a transaction manager. (On an aside: the academic world does distinguish between a transaction manager and coordinator. I am using the terms interchangeably here.)

MySQL as a state store to track XA transactions

Upon request, PECL/mysqlnd_ms can record the state of each global transaction in a MySQL database. Should your PHP script (and with it PECL/mysqlnd_ms) crash or be interrupted in another way with an XA transaction being unfinished, then the next PHP script that runs can check the database and “garbage collect” the unfinished global transaction. The “next” PHP script could be run on the same server or another one, as long as all servers use the same MySQL database to track XA transactions, the garbage collection will work flawless.

The illustration below shows a scenario with two PHP servers that use PECL/mysqlnd_ms. A script on one of the servers runs and begins a distributed (XA) transactions that involves two MySQL servers M1, M2. As you can see, a simple sequence of mysqlnd_ms_xa_begin(); ... ; mysqli_query(); mysqli_query() causes a lot of background activity inside PECL/mysqlnd_ms.

Tracking XA transactions for “garbage collection” to avoid blocking situations
PHP server A PHP server B
Your code Inside PECL/mysqlnd_ms
mysqlnd_ms_xa_begin(id = 1)
  • ensure there is no open transaction
  • optional: record new XA trx in state store
… /* select some server M1 */ …
mysqli_query(query)
  • optional: if new participant, record participant in state store
  • inject XA BEGIN on server M1
  • update participant state to XA BEGIN sent
  • optional: record new state in state store
  • execute query on server M1
… /* select some server M2 */ …
mysqli_query(query)
  • optional: if new participant, record participant in state store
  • inject XA BEGIN on server M2
  • update participant state to XA BEGIN sent
  • optional: record new state in state store
  • execute query on server M2

PECL/mysqlnd_ms does all the things that you would have to do in your script manually otherwise. It issues the appropriate SQL commands for XA transactions on the participating MySQL servers. And, optionally, it can record the participants and their state in a state store.

PHP server A PHP server B
Your code Inside PECL/mysqlnd_ms
mysqlnd_ms_xa_commit(1)
  • optional: record intent to commit in state store
  • inject XA END on server M1
  • optional: record new state in state store
  • inject XA PREPARE on server M1
  • optional: record new state in state store
  • inject XA COMMIT on server M1
  • optional: record new state in state store
  • inject XA END on server M2
  • optional: record new state in state store
  • inject XA PREPARE on server M2
  • optional: record new state in state store
  • unexpected crash

In case of an interruption, it is possible to reconstruct the state of the XA transaction based on the records from the state store. Using a state store is optional, however, if you don’t configure PECL/mysqlnd_ms to use one, it is your responsibility to ensure that no MySQL server ever gets blocked. Please, note how often the state store gets queried. Using a state store adds quite some work to the system but it is necessary to play safe.

PHP server A PHP server B
Your code Inside PECL/mysqlnd_ms
Script ends
  • Rollback open XA transactions, if any and rollback_on_close is set (default)
PHP internal shutdown: RSHUTDOWN
  • Decide whether to do background garbage collection based on probability setting
  • Search for up to max_transactions_per_run unfinished XA transactions in state store
  • Do at most max_retries GC runs attempting to fix a certain failed XA transaction

Issues that require no garbage collection

Most errors around XA transactions do not require special garbage collection. Simple cases are handled immediately by PECL/mysqlnd_ms and end up with the transaction being rolled back. Errors during the first phase of the XA two phase commit protocol are uncritical. No matter whether the error is caused by a server crash, a client/PHP crash, the client loosing its connection to one of the MySQL servers participating in the global transaction, or a participant reports a simple SQL error, the XA transaction will be properly rolled back.

Participants that got disconnected or recover from a crash forget about XA transactions that have been in their first phase at the time of the error. Assume, you have started an XA transaction in which three MySQL servers A, B, and C participate. During mysqlnd_ms_xa_commit(), PECL/mysqlnd_ms steps the participating servers through the commit protocol and issues the SQL commands: XA END (XA_ACTIVE state), XA PREPARE (XA_PREPARED state), XA COMMIT (XA_COMMIT state). For each step, the list of participants is iterated.

Uncritical, first phase errors during mysqlnd_ms_commit()
MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms
XA_IDLE mysqli_query(<some query>)

  • Pick appropriate server for some query/code>
  • Inject XA BEGIN on server
  • Run some query on server
XA_IDLE XA_IDLE XA_IDLE Assorted queries run on all participants
XA_ACTIVE XA_ACTIVE XA_IDLE (XA END failed) mysqlnd_ms_xa_commit()

  • On all participants: change participant state from XA_IDLE to XA_ACTIVE (= run XA END)
  • If error, try rollback on all participants, including failed one
  • Reason of the error on participant 3 (or during rollback on other participants…
    • Client crashed or got disconnected: safe to ignore, server has dropped XA transaction
    • Server crashed: safe to ignore, XA transaction has been dropped (2PC: first phase)
    • SQL error: rollback performed

Let there be an error with some participant around XA END. Upon the error, the plugin rolls back the global transactions. Effectively, it continues stepping the servers through the states but this time with the goal to send XA ROLLBACK. No matter what made XA END fail in the first place, the overall issue will most likely be solved. Should PHP have crashed, during XA END, all the servers notice it and drop the XA transaction. Should PHP have lost its connection to a server, it is the same: the server drops the XA transaction. Should one MySQL have gone away, it will drop the XA transaction upon recovery. Because PHP is still working, PECL/mysqlnd_ms tells the other servers to do the same and issues a rollback.

Whatever goes wrong prior to any server reaching XA_PREPARED (= second phase of the two phase protocol), a rollback happens and no garbage collection is required. You are safe even without configuring a state store for garbage collection!

When garbage collection becomes necessary

Any troubles that PECL/mysqlnd_ms has committing a global transaction after any participant entered the XA_PREPARED state, may require garbage collection. But, there are sill cases that can be ignored and require no attention.

Assume the below case where the global transaction has been prepared on servers A and B but XA PREPARE fails on the third participant, server C. Should PHP crash, it’s safe to ignore the case. A and B will see the client disconnect and rollback the transaction, so does C.

Should PHP have lost a connection to any of the servers, the disconnected MySQL server forgets about the XA transaction, even if it is in XA_PREPARED state. And, any participant that can still be reached will receive the rollback command from PECL/mysqlnd_ms.

Second phase errors during mysqlnd_ms_commit()
MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms
XA_PREPARED XA_PREPARED XA_ACTIVE (XA PREPARE failed) mysqlnd_ms_xa_commit()

  • On all participants: change participant state from XA_IDLE to XA_ACTIVE, then to XA_PREPARED
  • If error, try rollback on all participants, including failed one
  • Reason of the error on participant 3 (or during rollback on other participants…)
    • Client crashed or got disconnected: safe to ignore, server has dropped XA transaction
    • SQL error (but no server crash): rollback performed

However, should a participating server crash when it is in XA_PREPARED state or failed to switch to XA_COMMIT after the global transaction has been committed on any of the other participants, PECL/mysqlnd_ms leaves the issue to be solved by the garbage collection. The plugin will neither wait until a crashed server possibly returns nor will it retry a failed XA COMMIT command.

A case during mysqlnd_ms_commit() that needs garbage collection
MySQL A MySQL B MySQL C PHP using PECL/mysqlnd_ms
XA_COMMIT XA_COMMIT XA_PREPARED mysqlnd_ms_xa_commit()

  • On all participants: change state to XA_PREPARED (success)
  • On all participants: change state to XA_COMMIT (fails on C either due to SQL error or server crashed)
  • If error (as its the case on C), wait for garbage collection

Please note, that by default the plugin takes no immediate action. It leaves you with a yet to be completed global transaction. In the example, the failed transaction would already be committed on participants A and B but C is blocked waiting for a decision. Should the MySQL server C be online, it can, of course, accept clients and run transactions but all resources associated with the failed transaction are still held and blocked.

Transactional databases are designed with the assumption that transactions rarely fail. To get into a situation with a blocked server (above example), a rare transaction failure must happen. And, it must happen at a certain point. Means, the example is a rare case. If you are still concerned about this rare case and want to see it solved immediately – if anyhow possible – do not wait for automatic background solution but invoke the garbage collection manually.

[...]
if (!mysqlnd_ms_xa_commit($link, $xa_id)) {
  /* optional: trigger GC immediately */
  mysqlnd_ms_xa_gc($link, $xa_id);
}

A MySQL limitation: replication troubles

MySQL has some XA restrictions but one really hurts. Assume the example with three participants A, B and C. MySQL servers A and B have committed a XA transaction, C crashes in XA_PREPARED state. Upon recovery, the PECL/mysqlnd_ms garbage collection will commit the XA transaction on C because the transaction is already partially committed on A and B.

Replication troubles
MySQL A MySQL B MySQL C
XA_COMMIT XA_COMMIT XA_PREPARED
Changes in binary log Changes in binary log Crash
Recovery
PECL/mysqlnd_ms GC: XA_COMMIT
Changes not in binary log
| | |
Slave of A gets changes Slave of B gets changes Slave of C looses changes

Trouble is, C does not log the changes made by the XA transaction in the binary log. A slave of C will not receive the updates. The master and the slave will go out of sync. With MySQL Fabric and sharding being a prime use case for XA transactions this is a severe server limitation. MySQL Fabric makes heavy use of MySQL Replication to achieve high availability of shards.

To play really safe: use XA only for read only transactions

The lost change set problem leads to the recommendation to use XA transactions for read-only operations only, if you want to be 100% safe. Should you not be using MySQL Replication, write operations are safe too.

Configuring a state store for garbage collection

For PECL/mysqlnd_ms to be able to solve failed XA transactions that cannot be simply rolled back immediately, you must configure a state store. Version 1.6 of the plugin supports only one state store: MySQL. The code has been layed out to allow for other state store, but hey, why look further? MySQL is readily available (otherwise you would not use a MySQL driver…) and it will not loose data during a crash.

{
  "myapp": {
    "xa": {
      "rollback_on_close": 1,
      "state_store": {
        "participant_localhost_ip": "192.168.7.13",
        "mysql": {
          "host": "192.168.7.12",
          "user": "root",
          "password": "",
          "db": "test",
          "port": "3312",
          "socket": null
        }
      }
    },
    [...]
  }
}

Configuring the state store in the plugins configuration file is straight forward, see above. Note the participant_localhost_ip setting. As explained, the state store will hold records of all participants. If a participant is to be recorded that has a host name of 127.0.0.1, then 127.0.0.1 has a special meaning to the host that writes the record. Should, later on, garbage collection be required and the garbage collection be run on a different host than the one who wrote the 127.0.0.1 record, the garbage collection needs to know the actual IP associated with the original 127.0.0.1.

{
  "myapp": {
    "xa": {
      "rollback_on_close": 1,
      "state_store": {
        "participant_localhost_ip": "192.168.7.13",
        "mysql": {
          "host": "192.168.7.12",
          "user": "root",
          "password": "",
          "db": "test",
          "port": "3312",
          "socket": null
        }
      },
      "garbage_collection": {
        "max_retries": 3,
        "probability": 0,
        "max_transactions_per_run": 100
      }
    },
    [...]
  }
}

With a state store configured, you can also change the defaults for the background garbage collection. The probability setting determines how often the garbage collection will be called during RSHUTDOWN. Allowed values range from 0...1000. During RSHUTDOWN, the plugin calcualtes a random value between 1...1000. If the probability setting is higher or equal to the random value, the background garbage collection will start. max_retries and max_transactions_per_run have been explained above and are also listed in the manual.

Testers wanted

The XA related code has been lab tested, however, I do not plan to call it anything more than beta prior to hearing back from actual users. As you saw, the garbage collection is a rather tricky piece of code. It should work, give it a try… and let us know whether it works.

Happy hacking!

@Ulf_Wendel Follow me on Twitter