This is a valid RSS feed.
This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.
line 22, column 0: (236 occurrences) [help]
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;" ...
<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Joomla! - Open Source Content Management" -->
<?xml-stylesheet href="/joomla/plugins/system/jce/css/content.css?badb4208be409b1335b815dde676300e" type="text/css"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>TusaCentral - MySQL Blogs</title>
<description><![CDATA[]]></description>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs</link>
<lastBuildDate>Thu, 21 Nov 2024 09:35:11 +0000</lastBuildDate>
<generator>Joomla! - Open Source Content Management</generator>
<atom:link rel="self" type="application/rss+xml" href="http://www.tusacentral.net/joomla/index.php/mysql-blogs?format=feed&type=rss"/>
<language>en-gb</language>
<item>
<title>How to migrate a production database to Percona Everest (MySQL) using Clone</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/257-how-to-migrate-a-production-database-to-percona-everest-mysql-using-clone</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/257-how-to-migrate-a-production-database-to-percona-everest-mysql-using-clone</guid>
<description><![CDATA[<p><span style="font-weight: 400;">The aim of this long article is to give you the instructions and tools to migrate your production database, from your current environment to a solution based on </span><a href="https://docs.percona.com/everest/index.html"><span style="font-weight: 400;">Percona Everest (MySQL)</span></a><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">Nice, you decided to test Percona Everest, and you found that it is the tool you were looking for to manage your private DBaaS. For sure the easiest part will be to run new environments to get better understanding and experience on how the solution works. However, the day when you will look to migrate your existing environments will come. What should you do?</span></p>
<p><span style="font-weight: 400;">Prepare a plan! In which the first step is to </span><b>understand your current environment</b><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;"> When I say understand the current environment, I mean that you need to have a clear understanding of:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">the current dimensions (CPU/Memory/Disk utilization)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">the way it is accessed by the application, what kind of query you have, is Read or Write intensive, do you have pure OLTP or also some analytic, any ELT processing</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">the way it is used, constant load or by time of the day or by day of the year? Do you have any peak ie: Black Friday </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">what is the RPO/RTO, do you need a Disaster Recovery site? </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Who is accessing your database, and why. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">What MySQL version are you using, is it compatible with Percona Everest MySQL versions? </span></li>
</ul>
<p><span style="font-weight: 400;">Once you have all the information, it is time to perform a quick review if the solution could fit or not, for this step, given its complexity, I suggest you contact Percona and get help from our experts to take the right decision. </span></p>
<p><span style="font-weight: 400;">From the above process you should come with few clear indications such as:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Needed resources</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">It is more read, write or 50/50</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The level of recovery I need</span></li>
</ul>
<p><span style="font-weight: 400;">The first thing to do is to calculate the optimal configuration. For this you can help yourself with the </span><a href="https://github.com/Tusamarco/mysqloperatorcalculator"><span style="font-weight: 400;">mysqloperatorcalculator</span></a><span style="font-weight: 400;">. The tool will give you the most relevant variables to set for MySQL, configuration that you will be able to pass to Percona Everest while creating the new cluster. </span></p>
<p><span style="font-weight: 400;">To install </span><a href="https://docs.percona.com/everest/install/SetupPrereqs.html"><span style="font-weight: 400;">Percona Everest see here</span></a></p>
<h2><span style="font-weight: 400;">Create the new cluster</span></h2>
<p><span style="font-weight: 400;">It is now time to open our Percona Everest console and start the adventure. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest1-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest1-a.jpg" alt="everest1 a" width="1024" height="169" class="alignnone size-large wp-image-98244" /></a></p>
<p><span style="font-weight: 400;">In the basic information step, look at the supported versions for Database Server</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest2-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest2-a.jpg" alt="everest2 a" width="1024" height="478" class="alignnone size-large wp-image-98246" /></a></p>
<p><span style="font-weight: 400;">This version and the source version must match to safely use the CLONE plugin. Note that you cannot clone between MySQL 8.0 and MySQL 8.4 but can clone within a series such as MySQL 8.0.37 and MySQL 8.0.42. Before 8.0.37, the point release number also had to match, so cloning the likes of 8.0.36 to 8.0.42 or vice-versa is not permitted</span></p>
<p><span style="font-weight: 400;">It is now time to set the resources, the value of them should come from the analysis previously performed.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest3-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest3-a.jpg" alt="everest3 a" width="1024" height="446" class="alignnone size-large wp-image-98248" /></a></p>
<p><span style="font-weight: 400;">Given that choose 1 (one) node, then Custom and feel the fields as appropriate.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest4-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest4-a.jpg" alt="everest4 a" width="1024" height="481" class="alignnone size-large wp-image-98250" /></a></p>
<p><span style="font-weight: 400;">In advance configuration add the IP(s) you want to allow to access the cluster. You must add the IP of the source, IE 18.23.4.12/32. </span></p>
<p><span style="font-weight: 400;">In the set database engine parameters add the values (for MySQL only) that the mysqloperatorcalculator is giving you. Do not forget to have the </span><b>mysqld</b><span style="font-weight: 400;"> section declaration.</span></p>
<p><span style="font-weight: 400;">For example, in our case I need to calculate the needed values for a MySQL server with 4 CPU 8GB ram serving OLTP load, once you have the mysqloperatorcalculator tool running:</span></p>
<pre class="lang:sh decode:true">$ curl -i -X GET -H "Content-Type: application/json" -d '{"output":"human","dbtype":"pxc", "dimension": {"id": 999, "cpu":4000,"memory":"8G"}, "loadtype": {"id": 3}, "connections": 300,"mysqlversion":{"major":8,"minor":0,"patch":36}}' http://127.0.0.1:8080/calculator</pre>
<p><span style="font-weight: 400;">You will get a set of values that after cleanup looks like:</span></p>
<pre class="lang:sh decode:true">[mysqld]
binlog_cache_size = 262144
binlog_expire_logs_seconds = 604800
binlog_format = ROW
… snip …
loose_wsrep_sync_wait = 3
loose_wsrep_trx_fragment_size = 1048576
loose_wsrep_trx_fragment_unit = bytes
</pre>
<p><span style="font-weight: 400;">Add the text in the TEXTAREA for the database parameters.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest5-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest5-a.jpg" alt="everest5 a" width="1024" height="354" class="alignnone size-large wp-image-98252" /></a></p>
<p><span style="font-weight: 400;">Enable monitoring if you like then click on Create database.</span></p>
<p><span style="font-weight: 400;">Once ready you will have something like this:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest6.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest6.jpg" alt="everest6" width="1024" height="158" class="alignnone size-large wp-image-98253" /></a></p>
<p><span style="font-weight: 400;">Or from shell</span></p>
<pre class="lang:sh decode:true">$ kubectl get pxc
NAME ENDPOINT STATUS PXC PROXYSQL HAPROXY AGE
test-prod1 xxx ready 1 1 2m49s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-xtradb-cluster-operator-fb4cf7f9d-97rfs 1/1 Running 0 13d
test-prod1-haproxy-0 3/3 Running 0 106s
test-prod1-pxc-0 2/2 Running 0 69s
</pre>
<p><span style="font-weight: 400;">We are now ready to continue our journey.</span></p>
<h2><span style="font-weight: 400;">Align the system users</span></h2>
<p><span style="font-weight: 400;">This is a very important step. Percona Everest use the Percona Operator who will create a </span><a href="https://docs.percona.com/percona-operator-for-mysql/pxc/users.html"><span style="font-weight: 400;">set of system users</span></a><span style="font-weight: 400;"> in the database, and these users must be present also in the source with the same level of GRANTS, otherwise after the clone phase is terminated, the system will not work correctly. </span></p>
<p><span style="font-weight: 400;">Keep in mind Percona Everest will create the users with some generated password, these passwords may or may not fit your company rules or be simply too crazy. Do not worry you will be able to change them. For now, let's see what the system has generated. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest8.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest8.jpg" alt="everest8" width="1024" height="234" class="alignnone size-large wp-image-98255" /></a></p>
<p><span style="font-weight: 400;">To see how to access the cluster click on the “</span><b>^</b><span style="font-weight: 400;">” top right, it will expand the section. User is “root” now unhide the password… Ok I don’t know you, but I do not like it at all. Let me change to the password I have already defined for </span><i><span style="font-weight: 400;">root</span></i><span style="font-weight: 400;"> in the source. </span></p>
<p><span style="font-weight: 400;">Percona Everest is not (yet) allowing you to modify the system users’ passwords from the GUI, but you can do it from command line:</span></p>
<pre class="lang:sh decode:true">DB_NAMESPACE=namespace';
DB_NAME='cluster-name';
USER='user';
PASSWORD='new-password';
kubectl patch secret everest-secrets-"$DB_NAME" -p="{"stringData":{"$USER": "$PASSWORD"}}" -n "$DB_NAMESPACE"
</pre>
<p><span style="font-weight: 400;">Before changing let us check what password we have also for the other system users. </span></p>
<p><span style="font-weight: 400;">About system users in Operator for MySQL (PXC based) we have the following:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">root</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">operator</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">xtrabackup</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">monitor</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">replication</span></li>
</ul>
<p><span style="font-weight: 400;">To get all of them use command line:</span></p>
<pre class="lang:sh decode:true">DB_NAMESPACE='namespace'; DB_NAME='cluster-name'; kubectl get secret everest-secrets-"$DB_NAME" -n "$DB_NAMESPACE" -o go-template='{{range $k,$v := .data}}{{"### "}}{{$k}}{{"| pw: "}}{{$v|base64decode}}{{"nn"}}{{end}}'|grep -E 'operator|replication|monitor|root||xtrabackup'
### monitor| pw: $&4fwdoYroBxFo#kQi
### operator| pw: NNfIUv+iL+J!,.Aqy94
### replication| pw: Rj89Ks)IVNQJH}Rd
### root| pw: f~A)Nws8wD<~%.j[
<span style="font-weight: 400;">### xtrabackup| pw: h)Tb@ij*0=(?,?30</span>
</pre>
<p><span style="font-weight: 400;">Now let me change my </span><i><span style="font-weight: 400;">root</span></i><span style="font-weight: 400;"> user password:</span></p>
<pre class="lang:sh decode:true">$ DB_NAMESPACE='namespace'; DB_NAME='cluster-name'; USER='root'; PASSWORD='root_password'; kubectl patch secret everest-secrets-"$DB_NAME" -p="{"stringData":{"$USER": "$PASSWORD"}}" -n "$DB_NAMESPACE"</pre>
<p><span style="font-weight: 400;">Now if I collapse and expand again (forcing a reload of the section):</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest9.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest9.jpg" alt="everest9" width="853" height="191" class="alignnone size-full wp-image-98256" /></a></p>
<p><span style="font-weight: 400;">My </span><i><span style="font-weight: 400;">root</span></i><span style="font-weight: 400;"> user password is aligned with the one I pushed. </span></p>
<p><span style="font-weight: 400;">As we have seen we have to decide what to do, so first thing is to check if our SOURCE has or not the users defined. If not, then it is easy we will just grab the users from the newly generated cluster and recreate them in the SOURCE.</span></p>
<p><span style="font-weight: 400;">To do so we will query the source database:</span></p>
<pre class="lang:sh decode:true">(root@localhost) [(none)]>select user,host,plugin from mysql.user order by 1,2;
+----------------------------+---------------+-----------------------+
| user | host | plugin |
+----------------------------+---------------+-----------------------+
| app_test | % | mysql_native_password |
| dba | % | mysql_native_password |
| dba | 127.0.0.1 | mysql_native_password |
| mysql.infoschema | localhost | caching_sha2_password |
| mysql.pxc.internal.session | localhost | caching_sha2_password |
| mysql.pxc.sst.role | localhost | caching_sha2_password |
| mysql.session | localhost | caching_sha2_password |
| mysql.sys | localhost | caching_sha2_password |
| operator | % | caching_sha2_password |
| pmm | 127.0.0.1 | caching_sha2_password |
| pmm | localhost | caching_sha2_password |
| replica | 3.120.188.222 | caching_sha2_password |
| root | localhost | caching_sha2_password |
+----------------------------+---------------+-----------------------+
</pre>
<p><span style="font-weight: 400;">We are lucky and there is nothing really conflicting, so we can export and create the users inside the SOURCE. To do so you can use </span><a href="https://docs.percona.com/percona-toolkit/pt-show-grants.html"><span style="font-weight: 400;">pt-show-grants</span></a><span style="font-weight: 400;">:</span></p>
<pre class="lang:sh decode:true">pt-show-grants --host cluster-end-point --port 3306 --user dba --password dba --only 'monitor'@'%','xtrabackup'@'%',operator@'%',replication@'%',root@'%</pre>
<p><span style="font-weight: 400;">This will generate an SQL output that you can run on the source. Please review it before running to be sure it will be safe for you to run it.</span></p>
<p><span style="font-weight: 400;">Once applied to source we will have:</span></p>
<pre class="lang:sh decode:true">+----------------------------+---------------+-----------------------+
| user | host | plugin |
+----------------------------+---------------+-----------------------+
| app_test | % | mysql_native_password |
| dba | % | mysql_native_password |
| dba | 127.0.0.1 | mysql_native_password |
| monitor | % | caching_sha2_password |
| mysql.infoschema | localhost | caching_sha2_password |
| mysql.pxc.internal.session | localhost | caching_sha2_password |
| mysql.pxc.sst.role | localhost | caching_sha2_password |
| mysql.session | localhost | caching_sha2_password |
| mysql.sys | localhost | caching_sha2_password |
| operator | % | caching_sha2_password |
| pmm | 127.0.0.1 | caching_sha2_password |
| pmm | localhost | caching_sha2_password |
| replica | 3.120.188.222 | caching_sha2_password |
| replication | % | caching_sha2_password |
| root | % | caching_sha2_password |
| root | localhost | caching_sha2_password |
| xtrabackup | % | caching_sha2_password |
+----------------------------+---------------+-----------------------+
</pre>
<p><span style="font-weight: 400;">The last step to do about the users, is to create a specific user to use for the migration. We will use it to perform the clone and after that we will remove it. </span></p>
<p><span style="font-weight: 400;">On SOURCE:</span></p>
<pre class="lang:mysql decode:true">create user migration@'%' identified by 'migration_password';
grant backup_admin on *.* to migration@'%'
</pre>
<p><span style="font-weight: 400;">On RECEIVER (new cluster):</span></p>
<pre class="lang:mysql decode:true"> create user migration@'%' identified by 'migration_password';
GRANT SYSTEM_USER, REPLICATION SLAVE, CONNECTION_ADMIN, BACKUP_ADMIN, GROUP_REPLICATION_STREAM, CLONE_ADMIN,SHUTDOWN ON *.* to migration@'%';
</pre>
<h2><span style="font-weight: 400;">Let us go CLONING </span></h2>
<p><span style="font-weight: 400;">First, is the CLONE plugin already there?</span></p>
<p><span style="font-weight: 400;">Discover this querying the two systems:</span></p>
<pre class="lang:mysql decode:true">SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME = 'clone';
SOURCE:
+-------------+---------------+
| PLUGIN_NAME | PLUGIN_STATUS |
+-------------+---------------+
| clone | ACTIVE |
+-------------+---------------+
</pre>
<pre class="lang:mysql decode:true">RECEIVER:
mysql> SELECT PLUGIN_NAME, PLUGIN_STATUS FROM INFORMATION_SCHEMA.PLUGINS WHERE PLUGIN_NAME = 'clone';
Empty set (0.42 sec)
</pre>
<p><span style="font-weight: 400;">RECEIVER doesn’t have the plugin active. Let us activate it:</span></p>
<pre class="lang:mysql decode:true">INSTALL PLUGIN clone SONAME 'mysql_clone.so';</pre>
<p><span style="font-weight: 400;">Warning! </span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">If your source is behind a firewall, you need to allow the RECEIVER to connect, to get the IP of the RECEIVER just do:</span></p>
<pre class="lang:sh decode:true">kubectl -n namespace exec mysqlpodname -c pxc -- curl -4s ifconfig.me</pre>
<p><span style="font-weight: 400;">This will return an IP, you need to add that IP to the firewall and allow the access. Keep this value aside, you will also need later to setup the asynchronous replication. </span></p>
<p> </p>
<p><span style="font-weight: 400;">Are we ready? Not really, there is a caveat here. If we clone with the Galera library active, the cloning will fail. </span></p>
<p><span style="font-weight: 400;">To have it working we must:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">disable the wsrep provider</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">stop operator probes to monitor the pod</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">connect directly to the pod to run the operation and to monitor it. </span></li>
</ol>
<p><span style="font-weight: 400;">To do the above, on the receiver, we can:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">add </span><i><span style="font-weight: 400;">wsrep_provider=none</span></i><span style="font-weight: 400;"> to the configuration</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">as soon as the pod is up (monitor the log) issue from command line the command:</span><span style="font-weight: 400;"><br /></span>
<pre class="lang:sh decode:true">kubectl -n namespace exec pod-name -c pxc -- touch /var/lib/mysql/sleep-forever</pre>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Connect to the pod using:</span><span style="font-weight: 400;"><br /></span>
<pre class="lang:sh decode:true">kubectl exec --stdin --tty <pod name> -n <namespace> -c pxc -- /bin/bash</pre>
</li>
</ol>
<p><span style="font-weight: 400;">During the time of the operations, the cluster will not be accessible from its end point and HAProxy pod will result down as well, all this is OK, don’t worry.</span></p>
<h3><span style="font-weight: 400;">Let us go…</span></h3>
<p><span style="font-weight: 400;">While monitoring the log and pod:</span></p>
<pre class="lang:sh decode:true">kubectl logs pod-name --follow -c pxc
kubectl get pods
</pre>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest10-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest10-a.jpg" alt="everest10" width="1024" height="125" class="alignnone size-large wp-image-98258" /></a></p>
<p><span style="font-weight: 400;">Once you click continue and then edit database, the pod will be restarted.</span></p>
<p><span style="font-weight: 400;">Wait for the message in the log:</span></p>
<pre class="lang:sh decode:true">[MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.36-28.1' socket: '/tmp/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), Release rel28, Revision bfb687f, WSREP version 26.1.4.3.
2024-07-29T17:22:11.933714Z 0 [System] [MY-013292] [Server] Admin interface ready for connections, address: '10.1.68.172' port: 33062</pre>
<p>As soon as you see it, run the command to prevent Operator to restart the pod:</p>
<pre class="lang:sh decode:true">kubectl -n namespace exec pod-name -c pxc -- touch /var/lib/mysql/sleep-forever</pre>
<p><span style="font-weight: 400;">Confirm file is there:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:sh decode:true">kubectl -n namespace exec pod-name -c pxc -- ls -l /var/lib/mysql|grep sleep</pre>
<p><span style="font-weight: 400;">Checking the status you will have:</span></p>
<pre class="lang:sh decode:true">NAME READY STATUS RESTARTS AGE
percona-xtradb-cluster-operator-fb4cf7f9d-97rfs 1/1 Running 0 13d
test-prod1-haproxy-0 2/3 Running 0 21h
test-prod1-pxc-0 1/2 Running 0 46s
</pre>
<p><span style="font-weight: 400;">Now you can connect to your pod only locally:</span></p>
<pre class="lang:sh decode:true">kubectl exec --stdin --tty <pod name> -n <namespace> -c pxc -- /bin/bash</pre>
<p><span style="font-weight: 400;">Once there:</span></p>
<pre class="lang:sh decode:true">mysql -uroot -p<root_password></pre>
<p><span style="font-weight: 400;">And you are in.</span></p>
<p><span style="font-weight: 400;">I suggest you to open two different bash terminals and in one run the monitor query:</span></p>
<pre class="lang:sh decode:true">while [ 1 == 1 ]; do mysql -uroot -p<root_password> -e "select id,stage,state,BEGIN_TIME,END_TIME,THREADS,((ESTIMATE/1024)/1024) ESTIMATE_MB,format(((data/estimate)*100),2) 'completed%', ((DATA/1024)/1024) DATA_MB,NETWORK,DATA_SPEED,NETWORK_SPEED from performance_schema.clone_progress;";sleep 1;done;</pre>
<p><span style="font-weight: 400;">This command will give you a clear idea of the status of the cloning process.</span></p>
<p><span style="font-weight: 400;">To clone from a SOURCE you need to tell the RECEIVER which source to trust.</span></p>
<p><span style="font-weight: 400;">On the other bash, inside the mysql client:</span></p>
<pre class="lang:mysql decode:true">SET GLOBAL clone_valid_donor_list = 'source_public_ip:port';
CLONE INSTANCE FROM 'migration'@'ip':port IDENTIFIED BY 'XXX';</pre>
<p>While cloning your monitor query will give you the status of the operation:</p>
<pre class="lang:mysql decode:true">+------+-----------+-------------+----------------------------+----------------------------+---------+-----------------+------------+---------------+------------+------------+---------------+
| id | stage | state | BEGIN_TIME | END_TIME | THREADS | ESTIMATE_MB | completed% | DATA_MB | NETWORK | DATA_SPEED | NETWORK_SPEED |
+------+-----------+-------------+----------------------------+----------------------------+---------+-----------------+------------+---------------+------------+------------+---------------+
| 1 | DROP DATA | Completed | 2024-07-30 15:07:17.690966 | 2024-07-30 15:07:17.806309 | 1 | 0.00000000 | NULL | 0.00000000 | 0 | 0 | 0 |
| 1 | FILE COPY | In Progress | 2024-07-30 15:07:17.806384 | NULL | 4 | 130692.40951157 | 3.55 | 4642.11263657 | 4867879397 | 491961485 | 491987808 |
| 1 | PAGE COPY | Not Started | NULL | NULL | 0 | 0.00000000 | NULL | 0.00000000 | 0 | 0 | 0 |
| 1 | REDO COPY | Not Started | NULL | NULL | 0 | 0.00000000 | NULL | 0.00000000 | 0 | 0 | 0 |
| 1 | FILE SYNC | Not Started | NULL | NULL | 0 | 0.00000000 | NULL | 0.00000000 | 0 | 0 | 0 |
| 1 | RESTART | Not Started | NULL | NULL | 0 | 0.00000000 | NULL | 0.00000000 | 0 | 0 | 0 |
| 1 | RECOVERY | Not Started | NULL | NULL | 0 | 0.00000000 | NULL | 0.00000000 | 0 | 0 | 0 |
+------+-----------+-------------+----------------------------+----------------------------+---------+-----------------+------------+---------------+------------+------------+---------------+
</pre>
<p><span style="font-weight: 400;">When the process is completed, the mysqld will shut down.</span></p>
<p><span style="font-weight: 400;">Checking in the log you will see something like this:</span></p>
<pre class="lang:sh decode:true">The /var/lib/mysql/sleep-forever file is detected, node is going to infinity loop
<span style="font-weight: 400;">If you want to exit from infinity loop you need to remove /var/lib/mysql/sleep-forever file</span></pre>
<p><span style="font-weight: 400;">Do not worry all is good!</span></p>
<p><span style="font-weight: 400;">At this point we want to have MySQL start again and validate the current files:</span></p>
<pre class="lang:sh decode:true">kubectl -n namespace exec podname -c pxc – mysqld &</pre>
<p><span style="font-weight: 400;">Check the log and if all is ok, connect to mysql using local client:</span></p>
<pre class="lang:sh decode:true">kubectl exec --stdin --tty <pod name> -n <namespace> -c pxc -- /bin/bash
mysql -uroot -p<password></pre>
<p>Issue <i>shutdown</i> command from inside.</p>
<p><span style="font-weight: 400;">It is time to remove the </span><i><span style="font-weight: 400;">wsrep_provider=none</span></i><span style="font-weight: 400;"> and after the </span><i><span style="font-weight: 400;">sleep-forever</span></i><span style="font-weight: 400;"> file.</span></p>
<p><span style="font-weight: 400;">Go to the Percona Everest GUI and remove from the Database Parameters </span><i><span style="font-weight: 400;">wsrep_provider=none</span></i><span style="font-weight: 400;"> click continue and then edit database.</span></p>
<p><span style="font-weight: 400;">Final step, remove the file:</span></p>
<pre class="lang:sh decode:true">kubectl -n namespace exec podname -c pxc -- rm -f /var/lib/mysql/sleep-forever</pre>
<p><span style="font-weight: 400;">Cluster will come back (after few restarts) with the new dataset and pointed to the SOURCE GTID:</span></p>
<pre class="lang:mysql decode:true">mysql> select @@gtid_executed;
+-----------------------------------------------+
| @@gtid_executed |
+-----------------------------------------------+
| aeb22c03-7f13-11ee-9ff6-0224c88bdc4c:1-698687 |
+-----------------------------------------------+
</pre>
<h2><span style="font-weight: 400;">Enable Replication</span></h2>
<p><span style="font-weight: 400;">Now if you are used to Percona Operator for MySQL (PXC based) you probably know that it does support </span><a href="https://www.percona.com/blog/migration-of-a-mysql-database-to-a-kubernetes-cluster-using-asynchronous-replication/"><span style="font-weight: 400;">remote asynchronous replication</span></a><span style="font-weight: 400;">. This feature is available in the operator used by Everest but it is not exposed yet. </span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">The benefit of using the “native” replication is that the replication will be managed by the operator in case of pod crash. This will allow the cluster to continue to replicate cross pods. </span></p>
<p><span style="font-weight: 400;">On the other hand, the method described below, which for the moment (Percona Everest v1.0.1) is the only applicable, require manual intervention to start the replication in case of pod failure. </span></p>
<p><span style="font-weight: 400;">Clarified that, here are the steps you need to follow to enable replication between the new environment and your current production. </span></p>
<p><span style="font-weight: 400;">On source:</span></p>
<pre class="lang:mysql decode:true">CREATE USER <replicauser>@'3.120.188.222' IDENTIFIED BY '<replicapw>';
GRANT REPLICATION SLAVE ON *.* TO replica@'<replica_external_ip>';</pre>
<p>The IP of <i>replica_external_ip</i> is the one I told you to keep aside, for convenience here the command to get it again:</p>
<pre class="lang:sh decode:true">kubectl -n namespace exec podname -c pxc -- curl -4s ifconfig.me</pre>
<p><span style="font-weight: 400;">On Receiver, connect to the pod using mysql client and type:</span></p>
<pre class="lang:mysql decode:true">CHANGE REPLICATION SOURCE TO SOURCE_HOST='<source>', SOURCE_USER=<replicauser>, SOURCE_PORT=3306, SOURCE_PASSWORD='<replicapw>', SOURCE_AUTO_POSITION = 1</pre>
<p><span style="font-weight: 400;">Then start replication as usual.</span></p>
<p><span style="font-weight: 400;">If all was done right, you will have the Replication working and your new database is replicating from current production, keeping the two in sync.</span></p>
<pre class="lang:mysql decode:true">mysql> show replica statusG
*************************** 1. row ***************************
Replica_IO_State: Waiting for source to send event
Source_Host: 18.198.187.64
Source_User: replica
Source_Port: 3307
Connect_Retry: 60
Source_Log_File: binlog.000001
Read_Source_Log_Pos: 337467656
Relay_Log_File: test-prod1-pxc-0-relay-bin.000002
Relay_Log_Pos: 411
Relay_Source_Log_File: binlog.000001
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
… snip …
Executed_Gtid_Set: aeb22c03-7f13-11ee-9ff6-0224c88bdc4c:1-698687
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Source_TLS_Version:
Source_public_key_path:
Get_Source_public_key: 0
Network_Namespace:
</pre>
<h2><span style="font-weight: 400;">Final touch</span></h2>
<p><span style="font-weight: 400;">The final touch is to move the cluster from 1 node to 3 nodes.</span></p>
<pre class="lang:sh decode:true">$ kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-xtradb-cluster-operator-fb4cf7f9d-97rfs 1/1 Running 0 14d
test-prod1-haproxy-0 2/2 Running 6 (48m ago) 77m
test-prod1-pxc-0 1/1 Running 0 45m
</pre>
<p><span style="font-weight: 400;">To do so, open the Percona Everest GUI, edit your database and in the </span><b>Resources</b><span style="font-weight: 400;"> tab, choose 3 nodes, then continue till the end and </span><b>edit database.</b><span style="font-weight: 400;"></span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest11-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest11-a.jpg" alt="everest11 a" width="1024" height="416" class="alignnone size-large wp-image-98260" /></a></p>
<p><span style="font-weight: 400;">At the end of the update process, you will have:</span></p>
<pre class="lang:sh decode:true">$ kubectl get pods
NAME READY STATUS RESTARTS AGE
percona-xtradb-cluster-operator-fb4cf7f9d-97rfs 1/1 Running 0 14d
test-prod1-haproxy-0 2/2 Running 6 (151m ago) 3h1m
test-prod1-haproxy-1 2/2 Running 0 103m
test-prod1-haproxy-2 2/2 Running 0 102m
test-prod1-pxc-0 1/1 Running 0 149m
test-prod1-pxc-1 1/1 Running 0 103m
test-prod1-pxc-2 1/1 Running 0 93m
</pre>
<p><span style="font-weight: 400;">At this point you have your new environment ready to go. </span></p>
<h2><span style="font-weight: 400;">Post migration actions</span></h2>
<p><span style="font-weight: 400;">Remember that there are always many other things to do once you have migrated the data:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Validate Data Integrity</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Consistency Check: Use tools like mysqlcheck or Percona’s pt-table-checksum to ensure data integrity and consistency between MySQL 8.0 and Percona Everest.</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Query Testing: Run critical queries and perform load testing to ensure that performance metrics are met and that queries execute correctly.</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Test and Optimize</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Benchmarking: Conduct performance benchmarking to compare MySQL 8.0 and Percona Everest. Use tools like sysbench or MySQL’s EXPLAIN statement to analyze query performance.</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Optimization: Tweak Percona Everest settings based on the benchmark results. Consider features like Percona’s Query Analytics and Performance Schema for deeper insights.</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Enable Backup schedule and Point In time Recovery</span><span style="font-weight: 400;"><br /></span><a href="https://www.percona.com/blog/wp-content/uploads/2024/09/everest12-a.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/how_to_migrate_toPeverest/everest12-a.jpg" alt="everest12 a" width="1024" height="376" class="alignnone size-large wp-image-98262" /></a></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Switch to Production</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Cutover Plan: Develop a cutover plan that includes a maintenance window, final data synchronization, and the switchover to the new database.</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">ALWAYS perform a backup of the platform.</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Monitoring and Support: Set up monitoring with tools like Percona Monitoring and Management (PMM) to keep an eye on performance, queries, and server health.</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Verification and Documentation:</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Data Validation: Conduct thorough testing to confirm that all application functionality works as expected with Percona Everest.</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Documentation: Update your database documentation to reflect the new setup, configurations, and any changes made during the migration.</span></li>
</ul>
</li>
</ul>
<h2><span style="font-weight: 400;">Summary of commands </span></h2>
<table>
<tbody>
<tr>
<td><span style="font-weight: 400;">Use</span></td>
<td><span style="font-weight: 400;">Command</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Get cluster state</span></td>
<td><span style="font-weight: 400;">kubectl get pxc</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Get list of the pods</span></td>
<td><span style="font-weight: 400;">kubectl get pods</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Return password for system users</span></td>
<td><span style="font-weight: 400;">DB_NAMESPACE='namespace'; DB_NAME='cluster-name'; kubectl get secret everest-secrets-"$DB_NAME" -n "$DB_NAMESPACE" -o go-template='{{range $k,$v := .data}}{{"### "}}{{$k}}{{"| pw: "}}{{$v|base64decode}}{{"nn"}}{{end}}'|grep -E 'operator|replication|monitor|root||xtrabackup'</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Change password for a given user</span></td>
<td><span style="font-weight: 400;">DB_NAMESPACE='namespace'; DB_NAME='cluster-name'; USER='root'; PASSWORD='root_password'; kubectl patch secret everest-secrets-"$DB_NAME" -p="{"stringData":{"$USER": "$PASSWORD"}}" -n "$DB_NAMESPACE</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Show the pod log for a specific container Tail style</span></td>
<td><span style="font-weight: 400;">kubectl logs pod-name --follow -c pxc</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Return public IP for that pod</span></td>
<td><span style="font-weight: 400;">kubectl -n namespace exec podname -c pxc -- curl -4s ifconfig.me</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Prevent operator to restart the pod</span></td>
<td><span style="font-weight: 400;">kubectl -n namespace exec pod-name -c pxc -- touch /var/lib/mysql/sleep-forever</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Remove the sleep-forever file</span></td>
<td><span style="font-weight: 400;">kubectl -n namespace exec pod-name -c pxc – rm -f /var/lib/mysql/sleep-forever</span></td>
</tr>
<tr>
<td><span style="font-weight: 400;">Connect to pod bash</span></td>
<td><span style="font-weight: 400;">kubectl exec --stdin --tty <pod name> -n <namespace> -c pxc -- /bin/bash</span></td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
<h1><span style="font-weight: 400;">References</span></h1>
<p><a href="https://www.percona.com/blog/understanding-what-kubernetes-is-used-for-the-key-to-cloud-native-efficiency/"><span style="font-weight: 400;">https://www.percona.com/blog/understanding-what-kubernetes-is-used-for-the-key-to-cloud-native-efficiency/</span></a></p>
<p><a href="https://www.percona.com/blog/should-you-deploy-your-databases-on-kubernetes-and-what-makes-statefulset-worthwhile/"><span style="font-weight: 400;">https://www.percona.com/blog/should-you-deploy-your-databases-on-kubernetes-and-what-makes-statefulset-worthwhile/</span></a></p>
<p><a href="https://www.tusacentral.com/joomla/index.php/mysql-blogs/242-compare-percona-distribution-for-mysql-operator-vs-aws-aurora-and-standard-rds"><span style="font-weight: 400;">https://www.tusacentral.com/joomla/index.php/mysql-blogs/242-compare-percona-distribution-for-mysql-operator-vs-aws-aurora-and-standard-rds</span></a></p>
<p><a href="https://www.tusacentral.com/joomla/index.php/mysql-blogs/243-mysql-on-kubernetes-demystified"><span style="font-weight: 400;">https://www.tusacentral.com/joomla/index.php/mysql-blogs/243-mysql-on-kubernetes-demystified</span></a></p>
<p><a href="https://github.com/Tusamarco/mysqloperatorcalculator"><span style="font-weight: 400;">https://github.com/Tusamarco/mysqloperatorcalculator</span></a></p>
<p><a href="https://www.percona.com/blog/migration-of-a-mysql-database-to-a-kubernetes-cluster-using-asynchronous-replication/"><span style="font-weight: 400;">https://www.percona.com/blog/migration-of-a-mysql-database-to-a-kubernetes-cluster-using-asynchronous-replication/</span></a></p>
<p> </p>]]></description>
<category>MySQL</category>
<pubDate>Mon, 02 Sep 2024 13:47:01 +0000</pubDate>
</item>
<item>
<title>Sakila, Where Are You Going?</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/256-sakila-where-are-you-going</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/256-sakila-where-are-you-going</guid>
<description><![CDATA[<p>This article is in large part the same of what I have published in the <a href="https://www.percona.com/blog/sakila-where-are-you-going/">Percona blog</a>. However I am reproposing it here given it is the first some other benchmarking exercise that I am probably going to present here in an extended format, while it may be more concise in other platforms. </p>
<p>In any case why this tests. </p>
<p>I am curious, and I do not like (at all) what is happening around MySQL and MariaDB, never like it, but now is really think is time to end this negative trend, that is killing not only the community, but the products as well. </p>
<h2>The tests</h2>
<h3>Assumptions</h3>
<p><span style="font-weight: 400;">There are many ways to run tests, and we know that results may vary depending on how you play with many factors, like the environment or the MySQL server settings. However, if we compare several versions of the same product on the same platform, it is logical to assume that all the versions will have the same “chance” to behave well or badly unless we change the MySQL server settings. </span></p>
<p><span style="font-weight: 400;">Because of this, I ran the tests ON DEFAULTS, with the clear assumption that if you release your product based on the defaults, that implies you had tested with them and consider them the safest for generic use. </span></p>
<p><span style="font-weight: 400;">I also applied some </span><a href="https://github.com/Tusamarco/blogs/blob/master/sakila_where_are_you_going/config_changes.txt"><span style="font-weight: 400;">modifications </span></a><span style="font-weight: 400;">and ran the tests again to see how optimization would impact performance. </span></p>
<h3><span style="font-weight: 400;">What tests do we run?</span></h3>
<p><span style="font-weight: 400;">High level, we run two sets of tests:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Sysbench</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">TPC-C (<a href="https://www.tpc.org/tpcc/" rel="nofollow">https://www.tpc.org/tpcc/</a>) like </span></li>
</ul>
<p><span style="font-weight: 400;">The full methodology and test details can be found <a href="https://github.com/Tusamarco/benchmarktools/blob/main/docs/plan.md" rel="nofollow">here</a></span><span style="font-weight: 400;">, while actual commands are available:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;"><a href="https://github.com/Tusamarco/benchmarktools/blob/main/software/fill_sysbench_map.sh" rel="nofollow">Sysbench</a></span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;"><a href="https://github.com/Tusamarco/benchmarktools/blob/main/software/fill_tpcc_map.sh" rel="nofollow">TPC-C</a> </span></li>
</ul>
<h2><span style="font-weight: 400;">Results</span></h2>
<p><span style="font-weight: 400;">While I have executed the whole set of tests as indicated on the page, </span><span style="font-weight: 400;">and all the results are visible </span><a href="https://github.com/Tusamarco/blogs/tree/master/sakila_where_are_you_going"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">, for brevity and because I want to keep this article at a high level, I will report and cover only the Read-Write tests and the TPC-C. </span></p>
<p><span style="font-weight: 400;">This is because, in my opinion, they offer an immediate and global view of how the server behaves. They also represent the most used scenario, while the other tests are more interesting to dig into problems. </span></p>
<p><span style="font-weight: 400;">The sysbench read/write tests reported below have a lower percentage of writes ~36% and ~64% reads, where reads are point selects and range selects. TPC-C instead has an even distribution of 50/50 % between read and write operations. </span></p>
<h3><span style="font-weight: 400;">Sysbench read and write tests </span></h3>
<p><span style="font-weight: 400;">Test using default configurations only MySQL in different versions. </span></p>
<p><span style="font-weight: 400;">Small dataset:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/mysql_trend_default_rw_small.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/mysql_trend_default_rw_small.jpg" alt="mysql trend default rw small" width="800" height="333" class="aligncenter wp-image-96835 size-large" /></a></p>
<p><span style="font-weight: 400;">Optimized configuration only MySQL:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/mysql_trend_optimized_rw_small_100_range.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/mysql_trend_optimized_rw_small_100_range.jpg" alt="mysql trend optimized rw small 100 range" width="800" height="333" class="aligncenter wp-image-96839 size-large" /></a></p>
<p><span style="font-weight: 400;">Large dataset using defaults:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/mysql_trend_default_rw_large_100_range.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/mysql_trend_default_rw_large_100_range.jpg" alt="mysql trend default rw large 100 range" width="800" height="333" class="aligncenter wp-image-96833 size-large" /></a></p>
<p><span style="font-weight: 400;">Using optimization:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/mysql_trend_optimized_rw_large_100_range.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/mysql_trend_optimized_rw_large_100_range.jpg" alt="mysql trend optimized rw large 100 range" width="800" height="333" class="aligncenter wp-image-96837 size-large" /></a></p>
<p><span style="font-weight: 400;">The first two graphs are interesting for several reasons, but one that jumps out is that we cannot count on DEFAULTS as a starting point. Or, to be correct, we can use them as the base from which we must identify better defaults; this is also corroborated by Oracle's recent decision to modify many defaults in 8.4 (<a href="https://lefred.be/content/mysql-8-4-lts-new-production-ready-defaults-for-innodb/" rel="nofollow">see article</a>). </span></p>
<p><span style="font-weight: 400;">Given that I will focus on the results obtained with the optimized configs.</span></p>
<p><span style="font-weight: 400;">Now looking at the graphs above, we can see that:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">MySQL 5.7 is performing better in both cases just using defaults.</span></li>
<li style="font-weight: 400;" aria-level="1">Given bad defaults, MySQL 8.036 was not performing well in the first case; just making some adjustments allowed it to over-perform 8.4 and be closer to what 5.7 can do.</li>
</ol>
<h3><span style="font-weight: 400;">TPC-C tests</span></h3>
<p>As indicated, TPC-C tests are supposed to be write-intensive, using transactions and more complex queries with join, grouping, and sorting.</p>
<p>I was testing the TPC-C using the most common isolation modes, Repeatable Reads, and Read Committed.</p>
<p><span style="font-weight: 400;">While we experienced several issues during the multiple runs, those were not consistent, mainly due to locking timeouts. Given that, while I am representing the issue presence with a blank in the graph, they are not to be considered to impact the execution trend but only represent a saturation limit. </span></p>
<p><span style="font-weight: 400;">Test using optimized configurations:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc-RepeatableRead-with-optimized_only_mysql.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_RepeatableRead_with_defaults_only_mysql.jpg" alt="tpcc RepeatableRead with defaults only mysql" width="800" height="333" class="aligncenter wp-image-96870 size-large" /></a></p>
<p><span style="font-weight: 400;">Test using optimized configurations:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc-ReadCommitted-with-optimized_only_mysql.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_ReadCommitted_with_optimized_only_mysql.jpg" alt="tpcc ReadCommitted with optimized only mysql" width="800" height="333" class="aligncenter wp-image-96866 size-large" /></a></p>
<p><span style="font-weight: 400;">In this test we can observe that MySQL 5.7 is better performing in comparison with the other MySQL versions. </span></p>
<h3>What if we compare it with Percona Server for MySQL and MariaDB?</h3>
<p><span style="font-weight: 400;">I will present only the optimized tests here for brevity because, as I saw before, we know defaults are not serving us well. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/mysql_versions_compare_optimized_rw_small_100_range.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/mysql_versions_compare_optimized_rw_small_100_range.jpg" alt="mysql versions compare optimized rw small 100 range" width="800" height="333" class="aligncenter wp-image-96848 size-large" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/mysql_versions_compare_optimized_rw_large_100_range.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/mysql_versions_compare_optimized_rw_large_100_range.jpg" alt="mysql versions compare optimized rw large 100 range" width="800" height="333" class="aligncenter wp-image-96846 size-large" /></a></p>
<p><span style="font-weight: 400;">When comparing the MYSQL versions against Percona Server for MySQL 8.0.36 and MariaDB 11.3, we see how MySQL 8.4 is doing better only in relation to MariaDB; after that, it remains behind also compared to MySQL 8.0.36. </span></p>
<h4><i><span style="font-weight: 400;">TPC-C</span></i></h4>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc-RepeatableRead-optimized_all.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_RepeatableRead_optimized_all.jpg" alt="tpcc RepeatableRead optimized all" width="800" height="333" class="aligncenter wp-image-96867 size-large" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc-ReadCommitted-optimized_all.png"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_ReadCommitted_optimized_all.jpg" alt="tpcc ReadCommitted optimized all" width="800" height="333" class="aligncenter wp-image-96854 size-large" /></a></p>
<p><span style="font-weight: 400;">As expected, MySQL 8.4 is not acting well here either, and only MariaDB is performing worse. </span><span style="font-weight: 400;">Note how Percona Server for MySQL 8.0.36 is the only one able to handle the increased contention. </span></p>
<h2><span style="font-weight: 400;">What are these tests saying to us?</span></h2>
<p><span style="font-weight: 400;">Frankly speaking, what we get here is what most of our users get as well, but on their own skin. MySQL performances are degrading with the increase of versions. </span></p>
<p><span style="font-weight: 400;">For sure, MySQL 8.x comes with interesting additions; however, if you consider performance as the first and most important topic, then MySQL 8.x is not any better. </span></p>
<p><span style="font-weight: 400;">Having said this, we must say that probably most of the ones still using MySQL 5.7 (and we have thousands of them) are right. Why embark on a very risky migration and then discover that you have lost a considerable percentage in performance? </span></p>
<p><span style="font-weight: 400;">Regarding this, if we analyze the data and convert the trends into transactions/sec, we can identify the following scenarios if we compare the tests done using TPC:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc_trx_lost_rr.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_trx_pct_lost_rr.jpg" alt="tpcc trx lost rr" width="800" height="581" class="aligncenter wp-image-96871 size-large" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc_trx_lost_rc.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_trx_lost_pct_rc.jpg" alt="tpcc trx lost pct rc" width="800" height="549" class="aligncenter wp-image-96872 size-large" /></a></p>
<p><span style="font-weight: 400;">As we can see, the performance degradation can be significant in both tests, while the benefits (when present) are irrelevant. </span></p>
<p><span style="font-weight: 400;">In absolute numbers:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc_trx_lost_rr-1.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_trx_lost_rr.jpg" alt="tpcc trx lost rr" width="800" height="581" class="aligncenter wp-image-96874 size-large" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/tpcc_trx_lost_rc-1.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/tpcc_trx_lost_rc.jpg" alt="tpcc trx lost rc" width="800" height="549" class="aligncenter wp-image-96873 size-large" /></a></p>
<p><span style="font-weight: 400;"><br /></span>In this scenario, we need to ask ourselves, can my business deal with such a performance drop?</p>
<h3>Considerations</h3>
<p><span style="font-weight: 400;">When MySQL was sold to SUN Microsystems, I was in MySQL AB. I was not happy about that move at all, and when Oracle took over SUN, I was really concerned about Oracle's possible decision to kill MySQL. I also decided to move on and join another company. </span></p>
<p><span style="font-weight: 400;">In the years after, I changed my mind, and I was supporting and promoting the Oracle/MySQL work. In many ways, I still am. </span></p>
<p><span style="font-weight: 400;">They did a great job rationalizing the development, and the code clean-up was significant. However, something did not progress with the rest of the code. The performance decrease we are seeing is the cost of this lack of progress; see also Peter's article <a href="https://www.percona.com/blog/is-oracle-finally-killing-mysql/">Is Oracle Finally Killing MySQL?.</a></span><span style="font-weight: 400;"></span></p>
<p><span style="font-weight: 400;">On the other hand, we need to recognize that Oracle is investing a lot in performance and functionalities when we talk of the OCI/MySQL/Heatwave offer. Only those improvements are not reflected in the MySQL code, no matter if it is Community or Enterprise. </span></p>
<p><span style="font-weight: 400;">Once more, while I consider this extremely sad, I can also understand why. </span></p>
<p><span style="font-weight: 400;">Why should Oracle continue to optimize the MySQL code for free when cloud providers such as Google or AWS use that code, optimize it for their use, make billions, and not even share the code back? </span></p>
<p><span style="font-weight: 400;">We know this has been happening for many years now, and we know this is causing a significant and negative impact on the open source ecosystem. </span></p>
<p><span style="font-weight: 400;">MySQL is just another Lego block in a larger scenario in which cloud companies are cannibalizing the work of others for their own economic return. </span></p>
<p><span style="font-weight: 400;">What can be done? I can only hope we will see a different behavior soon. Opening the code and investing in projects that will help communities such as MySQL to quickly recover the lost ground. </span></p>
<p>Let me add that while is perfectly normal in our economy to look for profit, at the end this is what capitalism is for, it is not normal, or for better say it is negative, to look for profit without keeping in mind you are burning out the resources. that gives you that profit. </p>
<p>This last is consumerism, using and abusing, without keeping in mind you MUST give the resources you use the time/energy/opportunity to renew and florish is stupid, short sight and suicidal.</p>
<p>Perfectly in line with our times isen't it? </p>
<p>So let say that many big names in the cloud, should seriously rethink what they are doing, not because they need to be nice. But because they will get better outcome and income helping the many opensource community instead as they are doing today, abusing them. </p>
<p><span style="font-weight: 400;">In the meantime, we must acknowledge that many customers/users are on 5.7 for a good reason and that until we are able to fix that, they may decide not to migrate forever or, if they must, to migrate to something else, such as Postgres. </span></p>
<p>Then Sakila will slowly and painfully die as usual for the greed of the human being, nothing new in a way, yes, but not good.</p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/06/dolphin_heatwave3.jpeg"><img src="http://www.tusacentral.net/joomla/images/stories/sakila_where_are_you_going/dolphin_heatwave3.jpeg" alt="dolphin heatwave3" width="600" height="600" class="aligncenter wp-image-96858 size-medium" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">Happy MySQL to all. </span></p>
<p> </p>]]></description>
<category>MySQL</category>
<pubDate>Tue, 18 Jun 2024 13:22:55 +0000</pubDate>
</item>
<item>
<title>Is MySQL Router 8.2 Any Better?</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/255-is-mysql-router-8-2-any-better</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/255-is-mysql-router-8-2-any-better</guid>
<description><![CDATA[<p>In my previous article, <a href="https://www.percona.com/blog/comparisons-of-proxies-for-mysql/">Comparisons of Proxies for MySQL</a>, I showed how MySQL Router was the lesser performing Proxy in the comparison. From that time to now, we had several MySQL releases and, of course, also some new MySQL Router ones.</p>
<p>Most importantly, we also had MySQL Router going back to being a level 7 proxy capable of redirecting traffic in case of R/W operations (<a href="https://dev.mysql.com/doc/mysql-router/8.2/en/router-read-write-splitting.html" rel="nofollow">see this</a>).</p>
<p>All these bring me hope that we will also have some good improvements in what is a basic functionality in a router: routing.</p>
<p>So with these great expectations, I had to repeat the exact same tests I did in my previous tests, plus I tested for MySQL Router only the cost of encapsulating the select inside a transaction.</p>
<p>Just keep in mind that for all the tests, MySQL Router was configured to use the read/write split option.</p>
<h2>The results</h2>
<p><span style="font-weight: 400;">Given this is the continuation of the previous blog, all the explanations about the tests and commands used are in the <a href="https://www.percona.com/blog/comparisons-of-proxies-for-mysql/">first article</a>. If you did not read that, do it now, or it will be difficult for you to follow what is explained later.</span></p>
<p><span style="font-weight: 400;">As indicated, I was looking to identify when the first proxy would reach a dimension that would not be manageable. The load is all in creating and serving the connections, while the number of operations is capped at 100.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/events_rate_82.png"><img src="https://www.percona.com/blog/wp-content/uploads/2024/01/events_rate_82-1024x608.png" alt="" width="600" height="356" class="aligncenter wp-image-93339 size-large" /></a></p>
<p><span style="font-weight: 400;">As you can see, MySQL Router was reaching the saturation level and was unable to serve traffic at exactly the same time as the previous test.</span></p>
<h2>Test two</h2>
<p><i><span style="font-weight: 400;">When the going gets tough, the tough get going </span></i><span style="font-weight: 400;">reprise ;) </span></p>
<p><span style="font-weight: 400;">Let’s remove the –rate limitation and see what will happen. </span><span style="font-weight: 400;">First, let us compare MySQL router versions only:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/router_82_80_comparison_events.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/router_82_80_comparison_events.png" alt="router 82 80 comparison events" width="600" height="376" class="size-large wp-image-93348 aligncenter" /></a></p>
<p><span style="font-weight: 400;">As we can see the MySQL Router version 8.2 is doing better up to 64 concurrent threads.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/router_82_80_comparison_latency.png"><img src="https://www.percona.com/blog/wp-content/uploads/2024/01/router_82_80_comparison_latency-1024x680.png" alt="" width="600" height="398" class="size-large wp-image-93349 aligncenter" /></a></p>
<p><span style="font-weight: 400;">Latency follows the same trend in old and new cases, and we can see that the new version is acting better, up to 1024 threads.</span></p>
<p><span style="font-weight: 400;">Is this enough to cover the gap with the other proxies? Which, in the end, is what we would like to see. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/events_norate_82.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/events_norate_82.png" alt="events norate 82" width="600" height="326" class="size-large wp-image-93337 aligncenter" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/latency_norate_82.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/latency_norate_82.png" alt="latency norate 82" width="600" height="364" class="size-large wp-image-93342 aligncenter" /></a></p>
<p><span style="font-weight: 400;">Well, I would say not really; we see a bit of better performance with low concurrent threads, but still not scaling and definitely lower than the other two.</span></p>
<p><span style="font-weight: 400;">Now let us take a look at the CPU saturation:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/cpu_saturation.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/cpu_saturation.png" alt="cpu saturation" width="600" height="300" class="size-full wp-image-93334 aligncenter" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/cpu.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/cpu.png" alt="cpu" width="600" height="300" class="size-full wp-image-93335 aligncenter" /></a></p>
<p>Here, we can see how MySQL Router hits the top as soon as the rate option is lifted and gets worse with the increase of the running threads.</p>
<h2>Test three</h2>
<p><span style="font-weight: 400;">This simple test was meant to identify the cost of a transaction, or better, what it will cost to include selects inside a transaction.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/read_events_trx.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/read_events_trx.png" alt="read events trx" width="600" height="356" class="size-large wp-image-93344 aligncenter" /></a></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2024/01/latency_events_trx.png"><img src="http://www.tusacentral.net/joomla/images/stories/ismysqlrouter82better/latency_events_trx.png" alt="latency events trx" width="600" height="371" class="size-large wp-image-93340 aligncenter" /></a></p>
<p><span style="font-weight: 400;">As we can clearly see, MySQL Router, when handling selects inside a transaction, will drop its performance drastically going back to version 8.0 performance.</span></p>
<h3>Conclusions</h3>
<p>To the initial question — Is MySQL Router 8.2 any better? — we can answer a small (very small) yes.</p>
<p>However, it is still far, far away from being competitive with ProxySQL (same proxy level) or with HAProxy. The fact it is not able to serve efficiently the requests also inside the lower set of concurrent threads, is disappointing.</p>
<p>Even more disappointing because MySQL Router is presented as a critical component in the MySQL InnoDB Cluster solution. How can we use it in the architectures if the product has such limitations?</p>
<p>I know that Oracle suggests to scale out, and I agree with them. When in need to scale with MySQL Router, the only option is to build a forest. However, we must keep in mind that each MySQL Router connects and queries the data nodes constantly and intensively. Given that it requires adding a forest of router nodes to scale, it is not without performance impact, given the increasing noise generated on the data nodes.</p>
<p>Anyhow also, if there is a theoretical option to scale, that is not a good reason to use a poor performing component.</p>
<p>I would prefer to use ProxySQL with Group Replication and add whatever script is needed in mysqlshell to manage it as Oracle is doing for the MySQL InnoDB cluster solution.</p>
<p>What also left me very unhappy is that MySQL InnoDB Cluster is one of the important components of the OCI offer for MySQL. Is Oracle using MySQL Router there as well? I assume so. Can we trust it? I am not feeling like I can.</p>
<p>Finally, what has been done for MySQL Router so far leads me to think that there is no real interest in making it the more robust and performing product that MySQL InnoDB Cluster deserves.</p>
<p>I hope I am wrong and that we will soon see a fully refactored version of MySQL Router. I really hope Oracle will prove me wrong.</p>
<p>Great MySQL to everyone.</p>]]></description>
<category>MySQL</category>
<pubDate>Wed, 10 Jan 2024 18:05:47 +0000</pubDate>
</item>
<item>
<title>Export and import of MySQL passwords using caching_sha2 </title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/254-export-and-import-of-mysql-passwords-using-caching-sha2</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/254-export-and-import-of-mysql-passwords-using-caching-sha2</guid>
<description><![CDATA[<p><em><span style="font-weight: 400;">Some fun is coming <a href="https://www.percona.com/blog/wp-content/uploads/2023/09/user-migration.jpg"><img src="https://www.percona.com/blog/wp-content/uploads/2023/09/user-migration-300x136.jpg" alt="" width="380" height="172" class="wp-image-91109 alignright" style="float: right;" /></a></span></em></p>
<p><span style="font-weight: 400;">While I was writing the internal guidelines on how to migrate from MariaDB to Percona Server, I had to export the users accounts in a portable way. This given MariaDB uses some non standard syntax brings me to first test some external tools such as Fred </span><a href="https://github.com/lefred/mysqlshell-plugins/wiki/user#getusersgrants"><span style="font-weight: 400;">https://github.com/lefred/mysqlshell-plugins/wiki/user#getusersgrants</span></a><span style="font-weight: 400;"> and our <a href="https://github.com/percona/percona-toolkit/blob/3.x/bin/pt-show-grants" rel="nofollow">PT-SHOW-GRANTS</a> tool. </span></p>
<p><span style="font-weight: 400;">Useless to say this had open a can worms, given first I had to fix/convert the specifics for MariaDB (not in the scope of this blog), then while testing I discover another nasty issue, that currently prevent us to easily export the new Passwords in MySQL 8 (and PS 8) when caching_sha2 is used. </span></p>
<p> </p>
<p><span style="font-weight: 400;">So what is the problem I am referring to?</span></p>
<p><span style="font-weight: 400;">Well the point is that when you generate passwords with caching_sha2 (default in mysql 8) the password generated can (will) contain characters that are not portable, not even between mysql 8. </span></p>
<p><span style="font-weight: 400;">Let's see a practical example to understand.</span></p>
<p><span style="font-weight: 400;">If I use old </span><i><span style="font-weight: 400;">mysql_native_password</span></i><span style="font-weight: 400;"> and I create a user such as:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">create user dba@'192.168.1.%' identified with mysql_native_password by 'dba';
</pre>
<p><span style="font-weight: 400;">My authentication_string will be: </span></p>
<pre class="lang:mysql decode:true">root@localhost) [(none)]>select user,host,authentication_string,plugin from mysql.user where user ='dba' order by 1,2;
+------+-------------+-------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------+-------------+-------------------------------------------+-----------------------+
| dba | 192.168.1.% | *381AD08BBFA647B14C82AC1094A29AD4D7E4F51D | mysql_native_password |
+------+-------------+-------------------------------------------+-----------------------+
</pre>
<p><span style="font-weight: 400;">At this point if you want to export the user:</span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>show create user dba@'192.168.1.%'G
*************************** 1. row ***************************
CREATE USER for dba@192.168.1.%: CREATE USER `dba`@`192.168.1.%` IDENTIFIED WITH 'mysql_native_password' AS '*381AD08BBFA647B14C82AC1094A29AD4D7E4F51D' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK PASSWORD HISTORY DEFAULT PASSWORD REUSE INTERVAL DEFAULT PASSWORD REQUIRE CURRENT DEFAULT
1 row in set (0.01 sec)
</pre>
<p><span style="font-weight: 400;">You just need to use the text after the semicolon and all will work fine. Remember that when you want to preserve the already converted password you need to use the </span><i><span style="font-weight: 400;">IDENTIFIED … AS <PW></span></i><span style="font-weight: 400;"> not the </span><b>BY</b><span style="font-weight: 400;"> or you will re-convert the password ;).</span></p>
<p><span style="font-weight: 400;"> Anyhow .. this is simple and what we are all used to. </span></p>
<p><span style="font-weight: 400;">Now if you instead try to use caching_sha2 things will go differently:</span></p>
<pre class="lang:mysql decode:true">root@localhost) [(none)]>create user dba@'192.168.4.%' identified with caching_sha2_password by 'dba';
Query OK, 0 rows affected (0.02 sec)
(root@localhost) [(none)]>select user,host,authentication_string,plugin from mysql.user where user ='dba' order by 1,2;
+------+-------------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------+-------------+------------------------------------------------------------------------+-----------------------+
| dba | 192.168.1.% | *381AD08BBFA647B14C82AC1094A29AD4D7E4F51D | mysql_native_password |
| dba | 192.168.4.% | $A$005$@&%1H5iNQx|.l{N7T/GosA.Lp4EiO0bxLVQp8Zi0WY2nXLr8TkleQPYjaqVxI7 | caching_sha2_password |
+------+-------------+------------------------------------------------------------------------+-----------------------+
2 rows in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">Probably you will not see it here given that while converted on your screen the special characters will be replaced, but the password contains invalid characters. </span></p>
<p><span style="font-weight: 400;">If I try to extract the </span><i><span style="font-weight: 400;">Create USER</span></i><span style="font-weight: 400;"> text I will get:</span></p>
<pre class="lang:mysql decode:true"> (root@localhost) [(none)]>show create user dba@'192.168.4.%'G
*************************** 1. row ***************************
CREATE USER for dba@192.168.4.%: CREATE USER `dba`@`192.168.4.%` IDENTIFIED WITH 'caching_sha2_password' AS '$A$005$@&%1H5iNQx|.l{N7T/GosA.Lp4EiO0bxLVQp8Zi0WY2nXLr8TkleQPYjaqVxI7' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK PASSWORD HISTORY DEFAULT PASSWORD REUSE INTERVAL DEFAULT PASSWORD REQUIRE CURRENT DEFAULT
1 row in set (0.00 sec)</pre>
<p><span style="font-weight: 400;">However if I try to use this text to generate the user after I drop it:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>drop user dba@'192.168.4.%';
Query OK, 0 rows affected (0.02 sec)
(root@localhost) [(none)]>create user dba@'192.168.4.%' IDENTIFIED AS 'NQx|.l{N7T/GosA.Lp4EiO0bxLVQp8Zi0WY2nXLr8TkleQPYjaqVxI7';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AS 'NQx|.l{N7T/GosA.Lp4EiO0bxLVQp8Zi0WY2nXLr8TkleQPYjaqVxI7'' at line 1
</pre>
<p><span style="font-weight: 400;">Don’t waste time, there is nothing wrong in the query, except the simple fact that you CANNOT use the text coming from the authentication_string when you have </span><i><span style="font-weight: 400;">caching_sha2</span></i><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">So? What should we do? </span></p>
<p><span style="font-weight: 400;">The answer is easy, we need to convert the password into binary and use/store that. </span></p>
<p><span style="font-weight: 400;">Let us try.</span></p>
<p><span style="font-weight: 400;">First create the user again:</span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>select user,host,authentication_string,plugin from mysql.user where user ='dba' order by 1,2;
+------+-------------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------+-------------+------------------------------------------------------------------------+-----------------------+
| dba | 192.168.1.% | *381AD08BBFA647B14C82AC1094A29AD4D7E4F51D | mysql_native_password |
| dba | 192.168.4.% | $A$005$X>ztS}WfR"k~aH3Hs0hBbF3WmM2FXubKumr/CId182pl2Lj/gEtxLvV0 | caching_sha2_password |
+------+-------------+------------------------------------------------------------------------+-----------------------+
2 rows in set (0.00 sec)
(root@localhost) [(none)]>exit
Bye
[root@master3 ps80]# ./mysql-3307 -udba -pdba -h192.168.4.57 -P3307
...
(dba@192.168.4.57) [(none)]>
</pre>
<p><span style="font-weight: 400;">OK as you can see I create the user and can connect, but as we know the PW is not portable.</span></p>
<p><span style="font-weight: 400;">Let us convert it and create the user:</span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>select user,host,convert(authentication_string using binary),plugin from mysql.user where user ='dba' and host='192.168.4.%' order by 1,2;
+------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+
| user | host | convert(authentication_string using binary) | plugin |
+------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+
| dba | 192.168.4.% | 0x2441243030352458193E107A74537D0157055C66527F226B7E5C6148334873306842624633576D4D32465875624B756D722F434964313832706C324C6A2F674574784C765630 | caching_sha2_password |
+------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+
</pre>
<p><span style="font-weight: 400;">So the password is: </span></p>
<pre class="lang:mysql decode:true">0x2441243030352458193E107A74537D0157055C66527F226B7E5C6148334873306842624633576D4D32465875624B756D722F434964313832706C324C6A2F674574784C7656
</pre>
<p>Let us use it:</p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>drop user dba@'192.168.4.%';
Query OK, 0 rows affected (0.02 sec)
(root@localhost) [(none)]>create user dba@'192.168.4.%' IDENTIFIED with 'caching_sha2_password' AS 0x2441243030352458193E107A74537D0157055C66527F226B7E5C6148334873306842624633576D4D32465875624B756D722F434964313832706C324C6A2F674574784C765630;
Query OK, 0 rows affected (0.03 sec)
</pre>
<p><span style="font-weight: 400;">Let us check the user now:</span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>select user,host, authentication_string,plugin from mysql.user where user ='dba' and host= '192.168.4.%' order by 1,2;
+------+-------------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+------+-------------+------------------------------------------------------------------------+-----------------------+
| dba | 192.168.4.% | $A$005$X>ztS}WfR"k~aH3Hs0hBbF3WmM2FXubKumr/CId182pl2Lj/gEtxLvV0 | caching_sha2_password |
+------+-------------+------------------------------------------------------------------------+-----------------------+
1 row in set (0.00 sec)
[root@master3 ps80]# ./mysql-3307 -udba -pdba -h192.168.4.57 -P3307
(dba@192.168.4.57) [(none)]>select current_user();
+-----------------+
| current_user() |
+-----------------+
| dba@192.168.4.% |
+-----------------+
1 row in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">As you can see the user has been created correctly and password is again in encrypted format. </span></p>
<p><span style="font-weight: 400;">In short what you need to do when in need to export users from MySQL/PS 8 is:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Read the user information</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Convert Password to hex format when plugin is caching_sha2</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Push the AS <password> converted to a file or any other way you were used to</span></li>
</ol>
<p> </p>
<p>Another possible solution is to use at session level the parameter <em>print_identified_with_as_hex.</em> If set causes SHOW CREATE USER to display such hash values as hexadecimal strings rather than as regular string literals. Hash values that do not contain unprintable characters still display as regular string literals, even with this variable enabled.</p>
<p><span style="font-weight: 400;">This at the end is exactly what Fred and I have done for our tools:</span></p>
<p><span style="font-weight: 400;">See:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Fred: </span><a href="https://github.com/lefred/mysqlshell-plugins/commit/aa5c6bbe9b9aa689bf7266f5a19a35d0091f6568"><span style="font-weight: 400;">https://github.com/lefred/mysqlshell-plugins/commit/aa5c6bbe9b9aa689bf7266f5a19a35d0091f6568</span></a></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Pt-show-grants: <a href="https://github.com/percona/percona-toolkit/blob/4a812d4a79c0973bf176105b0d138ad0a2a46b2f/bin/pt-show-grants#L2058" rel="nofollow">https://github.com/percona/percona-toolkit/blob/4a812d4a79c0973bf176105b0d138ad0a2a46b2f/bin/pt-show-grants#L2058</a></span></li>
</ul>
<h1><span style="font-weight: 400;">Conclusions</span></h1>
<p><span style="font-weight: 400;">MySQL 8 and Percona server comes with a more secure hashing mechanism <i>caching_sha2_password</i> which is also the default. However if you have the need to migrate users and you use your own tools to export and import the passwords, you must update them as indicated. Or use the Percona Toolkit tools that we keep up to date for you.</span></p>
<p> </p>
<p><span style="font-weight: 400;">Have fun with MySQL!!</span></p>]]></description>
<category>MySQL</category>
<pubDate>Sun, 24 Sep 2023 15:39:08 +0000</pubDate>
</item>
<item>
<title>Proof of Concept: Horizontal Write Scaling for MySQL with Kubernetes Operator</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/253-horizontal-write-scaling-for-mysql-with-operator-poc</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/253-horizontal-write-scaling-for-mysql-with-operator-poc</guid>
<description><![CDATA[<p><span style="font-weight: 400;">Historically MySQL is great in horizontal READ scale. The scaling in that case is offered by the different number of Replica nodes, no matter if using standard asynchronous replication or synchronous replication. </span></p>
<p><span style="font-weight: 400;">However those solutions do not offer the same level of scaling for writes operation. </span></p>
<p><span style="font-weight: 400;">Why? Because the solutions still rely on writing in one single node that works as Primary. Also in case of multi-Primary the writes will be distributed by transaction. In both cases, when using virtually-synchronous replication, the process will require certification from each node and local (by node) write, as such the number of writes are NOT distributed across multiple nodes but duplicated. </span></p>
<p><span style="font-weight: 400;">The main reason behind this is that MySQL is a relational database system (RDBMS), and any data that is going to be written in it, must respect the RDBMS rules (</span><a href="https://en.wikipedia.org/wiki/Relational_database"><span style="font-weight: 400;">https://en.wikipedia.org/wiki/Relational_database</span></a><span style="font-weight: 400;">). In short any data that is written must be consistent with the data present. To achieve that the data needs to be checked with the existing through defined relations and constraints. This action is something that can affect very large datasets and be very expensive. Think about updating a table with millions of rows that refer to another table with another million rows. </span></p>
<p><span style="font-weight: 400;">An image may help:</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/schema-db.gif"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/schema-db.gif" alt="schema db" width="478" height="600" class="alignnone size-full wp-image-87678" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">Every time I will insert an order, I must be sure that all the related elements are in place and consistent. </span></p>
<p><span style="font-weight: 400;">This operation is quite expensive but our database is able to run it in a few milliseconds or less, thanks to several optimizations that allow the node to execute most of them in memory, with none or little access to mass storage. </span></p>
<p><span style="font-weight: 400;">The key factor is that the whole data structure resides in the same location (node) facilitating the operations.</span></p>
<p><span style="font-weight: 400;">Once we have understood that, it will also become clear why we cannot have relational data split in multiple nodes and have to distribute writes by table. If I have a node that manages only the items, another the orders, another the payments, I will need to have my solution able to deal with distributed transactions, each of which needs to certify and verify other nodes data. </span></p>
<p><span style="font-weight: 400;">This level of distribution will seriously affect the efficiency of the operation which will increase the response time significantly. This is it, nothing is impossible however the performances will be so impacted that each operation may take seconds instead milliseconds or fraction of it, unless lifting some of the rules breaking the relational model.</span></p>
<p><span style="font-weight: 400;">MySQL as well as other RDBMS are designed to work respecting the model and cannot scale in any way by fragmenting and distributing a schema, so what can be done to scale?</span></p>
<p> </p>
<p><span style="font-weight: 400;">The alternative is to split a consistent set of data into fragments. What is a consistent set of data? It all depends on the kind of information we are dealing with. Keeping in mind the example above where we have a shop online serving multiple customers, we need to identify which is the most effective way to split the data.</span></p>
<p><span style="font-weight: 400;">For instance if we try to split the data by Product Type (Books, CD/DVD, etc) we will have a huge duplication of data related to customers/orders/shipments and so on, all this data is also quite dynamic given I will have customers constantly ordering things. </span></p>
<p> </p>
<p><span style="font-weight: 400;">Why duplicate the data? Because if I do not duplicate that data I will not know if a customer had already bought or not that specific item, or I will have to ask again about the shipment address and so on. Which means also that any time a customer buys something or puts something in the wish list I have to reconcile the data in all my nodes/clusters.</span></p>
<p> </p>
<p><span style="font-weight: 400;">On the other hand if I choose to split my data by country of customer’s residence the only data I will have to duplicate and keep in sync is the one related to the products, of which the most dynamic one will be the number of items in stock. This of course unless I can organize my products by country as well, which is a bit unusual nowadays, but not impossible. </span></p>
<p> </p>
<p><span style="font-weight: 400;">Another possible case is if I am a health organization and I manage several hospitals. As for the example above, it will be easier to split my data by hospital given most of the data related to patients is bound to the hospital itself as well as treatments and any other element related to hospital management. While it will make no sense to split by patient's country of residence.</span></p>
<p> </p>
<p><span style="font-weight: 400;">This technique of splitting the data into smaller pieces is called </span><b>sharding </b><span style="font-weight: 400;">and at the moment is the only way we have to scale RDBM horizontally. </span></p>
<p> </p>
<p><span style="font-weight: 400;">In the MySQL open source ecosystem we have only two consolidated ways to perform sharding, Vitess and ProxySQL. The first one is a complete solution that takes ownership of your database and manages almost any aspect of its operations in a sharded environment, this includes a lot of specific features for DBA to deal with daily operations like table modifications, backup and more. </span></p>
<p> </p>
<p><span style="font-weight: 400;">While this may look great it also comes with some string attached, including the complexity and proprietary environment. That makes Vitess a good fit for “complex” sharding scenarios where other solutions may not be enough.</span></p>
<p> </p>
<p><span style="font-weight: 400;">ProxySQL does not have a sharding mechanism “per se” but given the way it works and the features it has, allow us to build simple sharding solutions. </span></p>
<p><span style="font-weight: 400;">It is important to note that most of the DBA operations will still be on DBA to be executed with incremented complexity given the sharding environment. </span></p>
<p> </p>
<p><span style="font-weight: 400;">There is a third option which is application aware sharding. </span></p>
<p><span style="font-weight: 400;">This solution sees the application be aware of the need to split the data in smaller fragments and internally point the data to different “connectors” who are connected to multiple data sources. </span></p>
<p><span style="font-weight: 400;">In this case the application is aware of a customer country and will redirect all the operations related to him to the datasource responsible for the specific fragment.</span></p>
<p><span style="font-weight: 400;">Normally this solution requires a full code re-design and could be quite difficult to achieve when it is injected after the initial code architecture definition. </span></p>
<p><span style="font-weight: 400;">On the other hand, if done at design it is probably the best solution, because it will allow the application to define the sharding rules and can also optimize the different data sources using different technologies for different uses.</span></p>
<p><span style="font-weight: 400;">One example could be the use of a RDBMS for most of the </span><b>Online transaction processing (</b><span style="font-weight: 400;">OLTP) data shared by country, and having the products as distributed memory cache with a different technology. At the same time all the data related to orders, payments and customer history can be consolidated in a data warehouse used to generate reporting. </span></p>
<p> </p>
<p><span style="font-weight: 400;">As said the last one is probably the most powerful, scalable and difficult to design and unfortunately it represents probably less than the 5% of the solution currently deployed. </span></p>
<p><span style="font-weight: 400;">As well as very few cases are in the need to have a full system/solution to provide scalability with sharding. </span></p>
<p> </p>
<p><span style="font-weight: 400;">By experience, most of the needs for horizontal scaling fell in the simple scenario, where there is the need to achieve sharding and data separation, very often with sharding-nothing architecture. In shared-nothing, each shard can live in a totally separate logical schema instance / physical database server / data center / continent. There is no ongoing need to retain shared access (from between shards) to the other unpartitioned tables in other shards.</span></p>
<p> </p>
<h2><span style="font-weight: 400;">The POC</span></h2>
<h4><span style="font-weight: 400;">Why this POC?</span></h4>
<p><span style="font-weight: 400;">In the years I have faced a lot of customers that were talking about scaling their database solution and looking at very complex sharding as Vitess as the first and only way to go. </span></p>
<p><span style="font-weight: 400;">This without even considering if their needs were driving them there for real. </span></p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/kiss.png" alt="kiss" width="216" height="188" class="wp-image-87680 alignleft" style="margin-right: 10px; float: left;" /></p>
<p><span style="font-weight: 400;">In my experience, and talking with several colleagues, I am not alone, when analyzing the real needs and after discussing with all the parties impacted, only a very small percentage of customers were in the real need of complex solutions. Most of the others were just trying to avoid a project that will implement simple shared-nothing solutions. Why? Because apparently it is simpler to migrate data to a platform that </span><i><span style="font-weight: 400;">does all for you</span></i><span style="font-weight: 400;">, than accept a bit of additional work and challenge at the beginning but keep a simple approach. Also going for the last shining things always has its magic.</span></p>
<p><span style="font-weight: 400;">On top of that, with the rise of Kubernetes and Mysql Operators, a lot of confusion starts to circulate, most of which generate by the total lack of understanding that a Database and a relational database are two separate things. That lack of understanding the difference and the real problems attached to a RDBMS had brought some to talk about horizontal scaling for databases, with a concerning superficiality and without clarifying if they were talking about RDBMS or not. As such some clarification is long due as well as putting back the KISS principle as the main focus. </span></p>
<p> </p>
<p><span style="font-weight: 400;">Given that, I thought that refreshing how ProxySQL could help in building a simple sharding solution may help to clarify the issues, reset the expectations and show how we can do things in a simpler way. (See my old post </span><a href="https://www.percona.com/blog/mysql-sharding-with-proxysql/"><span style="font-weight: 400;">https://www.percona.com/blog/mysql-sharding-with-proxysql/</span></a><span style="font-weight: 400;">). </span></p>
<p> </p>
<p><span style="font-weight: 400;">To do so I had built a simple POC that illustrates how you can use Percona Operator for MySQL (POM) and ProxySQL to build a sharded environment with a good level of automation for some standard operations like backup/restore software upgrade and resource scaling. </span></p>
<p> </p>
<h4><span style="font-weight: 400;">Why Proxysql?</span></h4>
<p><span style="font-weight: 400;">In the following example we mimic a case where we need </span><i><span style="font-weight: 400;">a simple sharding solution</span></i><span style="font-weight: 400;">, which means we just need to redirect the data to different data containers, keeping the database maintenance operations on us. In this common case we do not need to implement a full sharding system such as Vitess. </span></p>
<p><span style="font-weight: 400;"></span></p>
<p><span style="font-weight: 400;">As illustrated above ProxySQL allows us to set up a common entry point for the application and then redirect the traffic on the base of identified sharding keys. It will also allow us to redirect read/write traffic to the primary and read only traffic to all secondaries. </span></p>
<p> </p>
<p><span style="font-weight: 400;">The other interesting thing is that we can have ProxySQL as part of the application pod, or as an independent service. Best practices indicate that having ProxySQL closer to the application will be more efficient especially if we decide to activate the caching feature. </span></p>
<p> </p>
<h4><span style="font-weight: 400;">Why POM</span></h4>
<p><span style="font-weight: 400;">Percona Operator for MySQL comes with three main solutions, Percona Operator for PXC, Percona Operator for MySQL Group Replication and Percona Operator for Percona server. The first two are based on virtually-synchronous replication, and allow the cluster to keep the data state consistent across all pods, which guarantees that the service will always offer consistent data. In K8s context we can see POM as a single service with native horizontal scalability for reads, while for writes we will adopt the mentioned sharding approach. </span></p>
<p> </p>
<p><span style="font-weight: 400;">The other important aspects of using a POM based solution is the automation it comes with. Deploying POM you will be able to set automation for backups, software updates, monitoring (using PMM) and last but not least the possibility to scale UP or DOWN just changing the needed resources. </span></p>
<h3><span style="font-weight: 400;">The elements used</span></h3>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/horizontal_scaling.jpg"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/horizontal_scaling.jpg" alt="horizontal scaling" width="450" height="422" class="alignnone wp-image-87688" /></a></p>
<p><span style="font-weight: 400;">In our POC I will use a modified version of sysbench (https://github.com/Tusamarco/sysbench), that has an additional field </span><i><span style="font-weight: 400;">continent</span></i><span style="font-weight: 400;"> and I will use that as a sharding key. At the moment and for the purpose of this simple POC I will only have 2 Shards.</span></p>
<p> </p>
<p><span style="font-weight: 400;">As the diagram above illustrates here we have a simple deployment but good enough to illustrate the sharding approach.</span></p>
<p><span style="font-weight: 400;">We have:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The application(s) node(s), it is really up to you if you want to test with one application node or more, nothing will change, as well as for the ProxySQL nodes, just keep in mind that if you use more proxysql nodes is better to activate the internal cluster support or use consul to synchronize them. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Shard 1 is based on POM with PXC, it has:</span></li>
</ul>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Load balancer for service entry point</span>
<ul>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Entry point for r/w</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Entry point for read only</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">3 Pods for Haproxy</span>
<ul>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Haproxy container</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Pmm agent container</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">3 Pods with data nodes (PXC)</span>
<ul>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">PXC cluster node container</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Log streaming</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Pmm container </span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Backup/restore service </span></li>
</ul>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Shard 2 is based on POM for Percona server and Group Replication (technical preview) </span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Load balancer for service entry point</span>
<ul>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Entry point for r/w</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Entry point for read only</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">3 Pods for MySQL Router (testing)</span>
<ul>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">MySQL router container</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">3 Pods with data nodes (PS with GR)</span>
<ul>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">PS -GR cluster node container</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Log streaming</span></li>
<li style="font-weight: 400;" aria-level="3"><span style="font-weight: 400;">Pmm container </span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Backup/restore on scheduler</span></li>
</ul>
</li>
</ul>
<p> </p>
<p><span style="font-weight: 400;">Now you may have noticed that the representation of the nodes are different in size, this is not a mistake while drawing. It indicates that I have allocated more resources (CPU and Memory) to shard1 than shard2. Why? Because I can and I am simulating a situation where a shard2 gets less traffic, at least temporarily, as such I do not want to give it the same resources as shard1. I will eventually increase them if I see the needs. </span></p>
<h2><span style="font-weight: 400;">The settings</span></h2>
<h3><span style="font-weight: 400;">Data layer</span></h3>
<p><span style="font-weight: 400;">Let us start with the easy one, the Data layer configuration. Configuring correctly the environment is the key, and to do so I am using a tool that I wrote specifically to calculate the needed configuration in K8s POM environment, you can find it here (</span><a href="https://github.com/Tusamarco/mysqloperatorcalculator"><span style="font-weight: 400;">https://github.com/Tusamarco/mysqloperatorcalculator</span></a><span style="font-weight: 400;">). </span></p>
<p><span style="font-weight: 400;">Once you have compiled it and run you can simply ask what “dimensions” are supported, or you can define a custom level of resources, but you will still need to indicate the level of expected load. In any case please refer to the README in the repository which has all the instructions.</span></p>
<p><span style="font-weight: 400;">The full cr.yaml for PXC shard1 is </span><a href="https://github.com/Tusamarco/blogs/blob/master/mysql_horizontal_scaling/cr_pxc.yml"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">, while the one for PS-GR </span><a href="https://github.com/Tusamarco/blogs/blob/master/mysql_horizontal_scaling/cr_ps_gr.yml"><span style="font-weight: 400;">here</span></a><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">For Shard 1: I asked for resources to cover traffic of type 2 (Light OLTP), configuration type 5 (2XLarge) 1000 connections.</span></p>
<p><span style="font-weight: 400;">For Shard2: I ask for resources to cover traffic of type 2 (Light OLTP), configuration type 2 (Small), 100 connections. </span></p>
<p><span style="font-weight: 400;">Once you have the CRs defined you can follow the official guidelines to set the environment up ( PXC (</span><a href="https://docs.percona.com/percona-operator-for-mysql/pxc/index.html"><span style="font-weight: 400;">https://docs.percona.com/percona-operator-for-mysql/pxc/index.html</span></a><span style="font-weight: 400;">), PS (</span><a href="https://docs.percona.com/percona-operator-for-mysql/ps/index.html"><span style="font-weight: 400;">https://docs.percona.com/percona-operator-for-mysql/ps/index.html</span></a><span style="font-weight: 400;">)</span></p>
<p> </p>
<p><span style="font-weight: 400;">It is time now to see the Proxysql settings.</span></p>
<h3><span style="font-weight: 400;">ProxySQL and Sharding rules</span></h3>
<p><span style="font-weight: 400;">As mentioned before we are going to test the load sharding by continent, and as also mentioned before we know that ProxySQL will not provide additional help to automatically manage the sharded environment. </span></p>
<p><span style="font-weight: 400;">Given that one way to do it is to create a DBA account per shard, or to inject shard information in the commands while executing. </span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">I will use the less comfortable one just to prove if it works, the different DBA accounts. </span></p>
<p> </p>
<p><span style="font-weight: 400;">We will have 2 shards, the sharding key is the continent field, and the continents will be grouped as follows:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Shard 1:</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Asia</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Africa</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Antarctica</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Europe</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">North America</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Shard 2:</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Oceania</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">South America</span></li>
</ul>
</li>
</ul>
<p> </p>
<p><span style="font-weight: 400;">The DBAs users:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">dba_g1</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">dba_g2</span></li>
</ul>
<p> </p>
<p><span style="font-weight: 400;">The application user:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">app_test</span></li>
</ul>
<p> </p>
<p><span style="font-weight: 400;">The host groups will be:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Shard 1</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">100 Read and Write</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">101 Read only</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Shard 2</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">200 Read and Write</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">201 Read only</span></li>
</ul>
</li>
</ul>
<p> </p>
<p><span style="font-weight: 400;">Once that is defined, we need to identify which query rules will serve us and how.</span></p>
<p><span style="font-weight: 400;">What we want is to redirect all the incoming queries for:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Asia, Africa, Antarctica, Europe and North America to shard 1.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Oceania and South America to shard 2</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Split the queries in R/W and Read only</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Prevent the execution of any query that do not have a shard key</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Backup data at regular interval and store it in a safe place</span></li>
</ul>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/query_rukes_sharding.png"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/query_rukes_sharding.png" alt="query rukes sharding" width="800" height="280" class="alignnone size-large wp-image-87681" /></a></p>
<p><span style="font-weight: 400;">Given the above we first define the rules for the DBAs accounts:</span></p>
<p><span style="font-weight: 400;">We set the Hostgroup for each DBA and then if the query matches the sharding rule we redirect it to the proper sharding, otherwise the HG will remain as set.</span></p>
<p><span style="font-weight: 400;">This allows us to execute queries like CREATE/DROP table on our shard without problem, but will allow us to send data where needed. </span></p>
<p><span style="font-weight: 400;">For instance the one below is the output of the queries that sysbench will run.</span></p>
<p><b>Prepare:</b></p>
<pre class="lang:mysql decode:true">INSERT INTO windmills_test1 /* continent=Asia */ (uuid,millid,kwatts_s,date,location,continent,active,strrecordtype) VALUES(UUID(), 79, 3949999,NOW(),'mr18n2L9K88eMlGn7CcctT9RwKSB1FebW397','Asia',0,'quq')
</pre>
<p><span style="font-weight: 400;">In this case I have the application simply injecting a comment in the INSERT SQL declaring the shard key, given I am using the account dba_g1 to create/prepare the schemas, rules 32/32 will be used and give I have sett apply=1, ProxySQL will exit the query rules parsing and send the command to relevant hostgroup.</span></p>
<p><span style="font-weight: 400;">Run:</span></p>
<pre class="lang:mysql decode:true">SELECT id, millid, date,continent,active,kwatts_s FROM windmills_test1 WHERE id BETWEEN ? AND ? AND continent='South America'
SELECT SUM(kwatts_s) FROM windmills_test1 WHERE id BETWEEN ? AND ? and active=1 AND continent='Asia'
SELECT id, millid, date,continent,active,kwatts_s FROM windmills_test1 WHERE id BETWEEN ? AND ? AND continent='Oceania' ORDER BY millid
SELECT DISTINCT millid,continent,active,kwatts_s FROM windmills_test1 WHERE id BETWEEN ? AND ? AND active =1 AND continent='Oceania' ORDER BY millid
UPDATE windmills_test1 SET active=? WHERE id=? AND continent='Asia'
UPDATE windmills_test1 SET strrecordtype=? WHERE id=? AND continent='North America'
DELETE FROM windmills_test1 WHERE id=? AND continent='Antarctica'
INSERT INTO windmills_test1 /* continent=Antarctica */ (id,uuid,millid,kwatts_s,date,location,continent,active,strrecordtype) VALUES (?, UUID(), ?, ?, NOW(), ?, ?, ?,?) ON DUPLICATE KEY UPDATE kwatts_s=kwatts_s+1
</pre>
<p><span style="font-weight: 400;">The above are executed during the tests. </span></p>
<p><span style="font-weight: 400;">In all of them the sharding key is present, either in the WHERE clause OR as comment. </span></p>
<p><span style="font-weight: 400;">Of course if I execute one of them without the sharding key, the firewall rule will stop the query execution, ie:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">mysql> SELECT id, millid, date,continent,active,kwatts_s FROM windmills_test1 WHERE id BETWEEN ? AND ?;
ERROR 1148 (42000): It is impossible to redirect this command to a defined shard. Please be sure you Have the Continent definition in your query, or that you use a defined DBA account (dba_g{1/2})
</pre>
<p><span style="font-weight: 400;">Check </span><a href="https://github.com/Tusamarco/blogs/blob/master/mysql_horizontal_scaling/operator_sharding_with_proxysql_public.txt"><span style="font-weight: 400;">here </span></a><span style="font-weight: 400;">for full command list</span></p>
<h3><span style="font-weight: 400;">Setting up the dataset</span></h3>
<p><span style="font-weight: 400;">Once we have the rules set it is time to setup the schemas and the data using sysbench (</span><a href="https://github.com/Tusamarco/sysbench"><span style="font-weight: 400;">https://github.com/Tusamarco/sysbench</span></a><span style="font-weight: 400;">), remember to use windmills_sharding tests. </span></p>
<p><span style="font-weight: 400;">The first operation is to build the schema on SHARD2 without filling it with data. This is a DBA action as such we will execute it using the dba_g2 account:</span></p>
<pre class="lang:sh decode:true">sysbench ./src/lua/windmills_sharding/oltp_read.lua --mysql-host=10.0.1.96 --mysql-port=6033 --mysql-user=dba_g2 --mysql-password=xxx --mysql-db=windmills_large --mysql_storage_engine=innodb --db-driver=mysql --tables=4 --table_size=0 --table_name=windmills --mysql-ignore-errors=all --threads=1 prepare
</pre>
<p><span style="font-weight: 400;">Setting table_size and pointing to the ProxySQL IP/port will do, and I will have:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">mysql> select current_user(), @@hostname;
+----------------+-------------------+
| current_user() | @@hostname |
+----------------+-------------------+
| dba_g2@% | ps-mysql1-mysql-0 |
+----------------+-------------------+
1 row in set (0.01 sec)
mysql> use windmills_large;
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_windmills_large |
+---------------------------+
| windmills1 |
| windmills2 |
| windmills3 |
| windmills4 |
+---------------------------+
4 rows in set (0.01 sec)
mysql> select count(*) from windmills1;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.09 sec)
</pre>
<p><span style="font-weight: 400;">All set but empty.</span></p>
<p><span style="font-weight: 400;">Now let us do the same but with the other DBA user:</span></p>
<pre class="lang:sh decode:true">sysbench ./src/lua/windmills_sharding/oltp_read.lua --mysql-host=10.0.1.96 --mysql-port=6033 --mysql-user=dba_g1 --mysql-password=xxx --mysql-db=windmills_large --mysql_storage_engine=innodb --db-driver=mysql --tables=4 --table_size=400 --table_name=windmills --mysql-ignore-errors=all --threads=1 prepare
</pre>
<p><span style="font-weight: 400;">If I do now the select above with user dba_g2:</span></p>
<pre class="lang:mysql decode:true">mysql> select current_user(), @@hostname;select count(*) from windmills1;
+----------------+-------------------+
| current_user() | @@hostname |
+----------------+-------------------+
| dba_g2@% | ps-mysql1-mysql-0 |
+----------------+-------------------+
1 row in set (0.00 sec)
+----------+
| count(*) |
+----------+
| 113 |
+----------+
1 row in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">While If I reconnect and use dba_g1:</span></p>
<pre class="lang:mysql decode:true">mysql> select current_user(), @@hostname;select count(*) from windmills1;
+----------------+--------------------+
| current_user() | @@hostname |
+----------------+--------------------+
| dba_g1@% | mt-cluster-1-pxc-0 |
+----------------+--------------------+
1 row in set (0.00 sec)
+----------+
| count(*) |
+----------+
| 287 |
+----------+
1 row in set (0.01 sec)
</pre>
<p><span style="font-weight: 400;">I can also check on ProxySQL to see which rules were utilized:</span></p>
<pre class="lang:mysql decode:true">select active,hits,destination_hostgroup, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply,flagIn,flagOUT FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+------+-----------------------+---------+---------------------+----------------------------------------------------------------------------+-------+--------+---------+
| hits | destination_hostgroup | rule_id | match_digest | match_pattern | apply | flagIN | flagOUT |
+------+-----------------------+---------+---------------------+----------------------------------------------------------------------------+-------+--------+---------+
| 3261 | 100 | 20 | NULL | NULL | 0 | 0 | 500 |
| 51 | 200 | 21 | NULL | NULL | 0 | 0 | 600 |
| 2320 | 100 | 31 | NULL | scontinents*(=|like)s*'*(Asia|Africa|Antarctica|Europe|North America)'* | 1 | 500 | 0 |
| 880 | 200 | 32 | NULL | scontinents*(=|like)s*'*(Oceania|South America)'* | 1 | 500 | 0 |
| 0 | 100 | 34 | NULL | scontinents*(=|like)s*'*(Asia|Africa|Antarctica|Europe|North America)'* | 1 | 600 | 0 |
| 0 | 200 | 35 | NULL | scontinents*(=|like)s*'*(Oceania|South America)'* | 1 | 600 | 0 |
| 2 | 100 | 51 | NULL | scontinents*(=|like)s*'*(Asia|Africa|Antarctica|Europe|North America)'* | 0 | 0 | 1001 |
| 0 | 200 | 54 | NULL | scontinents*(=|like)s*'*(Oceania|South America)'* | 0 | 0 | 1002 |
| 0 | 100 | 60 | NULL | NULL | 0 | 50 | 1001 |
| 0 | 200 | 62 | NULL | NULL | 0 | 60 | 1002 |
| 7 | NULL | 2000 | . | NULL | 1 | 0 | NULL |
| 0 | 100 | 2040 | ^SELECT.*FOR UPDATE | NULL | 1 | 1001 | NULL |
| 2 | 101 | 2041 | ^SELECT.*$ | NULL | 1 | 1001 | NULL |
| 0 | 200 | 2050 | ^SELECT.*FOR UPDATE | NULL | 1 | 1002 | NULL |
| 0 | 201 | 2051 | ^SELECT.*$ | NULL | 1 | 1002 | NULL |
+------+-----------------------+---------+---------------------+----------------------------------------------------------------------------+-------+--------+---------+
</pre>
<h3><span style="font-weight: 400;">Running the application</span></h3>
<p><span style="font-weight: 400;">Now that the data load test was successful, let us do the real load following the indication as above but use 80 Tables and just a bit more records like 20000, nothing huge. </span></p>
<p> </p>
<p><span style="font-weight: 400;">Once the data is loaded we will have the 2 shards with different numbers of records, if all went well the shard2 should have ¼ of the total and shard1 ¾ .</span></p>
<p> </p>
<p><span style="font-weight: 400;">When load is over I have as expected:</span></p>
<pre class="lang:mysql decode:true">mysql> select current_user(), @@hostname;select count(*) as shard1 from windmills_large.windmills80;select /* continent=shard2 */ count(*) as shard2 from windmills_large.windmills80;
+----------------+--------------------+
| current_user() | @@hostname |
+----------------+--------------------+
| dba_g1@% | mt-cluster-1-pxc-0 |
+----------------+--------------------+
1 row in set (0.00 sec)
+--------+
| shard1 |
+--------+
| 14272 | ← Table windmills80 in SHARD1
+--------+
+--------+
| shard2 |
+--------+
| 5728 | ← Table windmills80 in SHARD2
+--------+
</pre>
<p><span style="font-weight: 400;">As you may have already noticed, I used a trick to query the other shard using the dba_g1 user, I just passed in the query the shard2 definition as a comment. That is all we need.</span></p>
<p><span style="font-weight: 400;">Let us execute the </span><i><span style="font-weight: 400;">run</span></i><span style="font-weight: 400;"> command for writes in sysbench and see what happens.</span></p>
<p><span style="font-weight: 400;">The first thing we can notice while doing writes is the query distribution:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">+--------+-----------+----------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries |
+--------+-----------+----------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+
| 10000 | 100 | ac966f7d46c04400fb92a3603f0e2634-193113472.eu-central-1.elb.amazonaws.com | 3306 | ONLINE | 24 | 0 | 138 | 66 | 25 | 1309353 |
| 100 | 101 | a5c8836b7c05b41928ca84f2beb48aee-1936458168.eu-central-1.elb.amazonaws.com | 3306 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 |
| 10000 | 200 | a039ab70e9f564f5e879d5e1374d9ffa-1769267689.eu-central-1.elb.amazonaws.com | 3306 | ONLINE | 24 | 1 | 129 | 66 | 25 | 516407 |
| 10000 | 201 | a039ab70e9f564f5e879d5e1374d9ffa-1769267689.eu-central-1.elb.amazonaws.com | 6447 | ONLINE | 0 | 0 | 0 | 0 | 0 | 0 |
+--------+-----------+----------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+
</pre>
<p><span style="font-weight: 400;">Where we can notice that the load in connection is evenly distributed, while the load is mainly going to shard1 as we expected given we have an unbalanced sharding by design.</span></p>
<p> </p>
<p><span style="font-weight: 400;">At MySQL level we had:</span></p>
<p> </p>
<p><span style="font-weight: 400;">Questions</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/pxc_ps_write_questions.png"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/pxc_ps_write_questions.png" alt="pxc ps write questions" width="1024" height="187" class="alignnone size-large wp-image-87684" /></a></p>
<p><span style="font-weight: 400;">Com Type</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/pxc_ps_write_com.png"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/pxc_ps_write_com.png" alt="pxc ps write com" width="1024" height="179" class="alignnone size-large wp-image-87683" /></a></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/number_operation_writes.png"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/number_operation_writes.png" alt="number operation writes" width="473" height="133" class="alignnone size-full wp-image-87687" /></a></p>
<p><span style="font-weight: 400;">The final point is, what is the gain using this sharding approach?</span></p>
<p><span style="font-weight: 400;">Well we still need to consider the fact we are testing on a very small set of data, however if we can already identify some benefit here, that will be an interesting result. </span></p>
<p> </p>
<p><span style="font-weight: 400;">Let see the write operations with 24 and 64 threads:</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/writes_write.png"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/writes_write.png" alt="writes write" width="450" height="395" class="alignnone wp-image-87693" /></a> <a href="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/latency_write.png"><img src="http://www.tusacentral.net/joomla/images/stories/mysql_horizontal_scaling/latency_write.png" alt="latency write" width="450" height="395" class="alignnone wp-image-87686" /></a></p>
<p><span style="font-weight: 400;">We get a gain of ~33% just using sharding, while for latency we do not have a cost on the contrary also with small load increase we can see how the sharded solution performs better. Of course we are still talking about low number of rows and running threads but gain is there. </span></p>
<p> </p>
<h3><span style="font-weight: 400;">Backup </span></h3>
<p><span style="font-weight: 400;">The backup and restore operation when using POM is completely managed by the operator (see instructions in the POM documentation </span><a href="https://docs.percona.com/percona-operator-for-mysql/pxc/backups.html"><span style="font-weight: 400;">https://docs.percona.com/percona-operator-for-mysql/pxc/backups.html</span></a><span style="font-weight: 400;"> and </span><a href="https://docs.percona.com/percona-operator-for-mysql/ps/backups.html"><span style="font-weight: 400;">https://docs.percona.com/percona-operator-for-mysql/ps/backups.html</span></a><span style="font-weight: 400;"> ). </span></p>
<p><span style="font-weight: 400;">The interesting part is that we can have multiple kind of backup solution, like:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">On demand</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Scheduled </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Full Point in time recovery with log streaming</span></li>
</ul>
<p><span style="font-weight: 400;">Automation will allow us to set schedule as simple as this:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:yaml decode:true"> schedule:
- name: "sat-night-backup"
schedule: "0 0 * * 6"
keep: 3
storageName: s3-eu-west
- name: "daily-backup"
schedule: "0 3 * * *"
keep: 7
storageName: s3-eu-west
</pre>
<p><span style="font-weight: 400;">Or if you want to run the on demand:</span></p>
<pre class="lang:sh decode:true">kubectl apply -f backup.yaml</pre>
<p><span style="font-weight: 400;">Where the backup.yaml file has very simple informations:</span></p>
<pre class="lang:yaml decode:true">apiVersion: ps.percona.com/v1alpha1
kind: PerconaServerMySQLBackup
metadata:
name: ps-gr-sharding-test-2nd-of-may
# finalizers:
# - delete-backup
spec:
clusterName: ps-mysql1
storageName: s3-ondemand
</pre>
<p><span style="font-weight: 400;">Using both methods we will be able to soon have a good set of backups like:</span></p>
<p><span style="font-weight: 400;">POM (PXC)</span></p>
<pre class="lang:sh decode:true">cron-mt-cluster-1-s3-eu-west-20234293010-3vsve mt-cluster-1 s3-eu-west s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-04-29-03:00:10-full Succeeded 3d9h 3d9h
cron-mt-cluster-1-s3-eu-west-20234303010-3vsve mt-cluster-1 s3-eu-west s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-04-30-03:00:10-full Succeeded 2d9h 2d9h
cron-mt-cluster-1-s3-eu-west-2023513010-3vsve mt-cluster-1 s3-eu-west s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-05-01-03:00:10-full Succeeded 33h 33h
cron-mt-cluster-1-s3-eu-west-2023523010-3vsve mt-cluster-1 s3-eu-west s3://mt-bucket-backup-tl/scheduled/mt-cluster-1-2023-05-02-03:00:10-full Succeeded 9h 9h
</pre>
<p><span style="font-weight: 400;">POM (PS) *</span></p>
<pre class="lang:sh decode:true">NAME STORAGE DESTINATION STATE COMPLETED AGE
ps-gr-sharding-test s3-ondemand s3://mt-bucket-backup-tl/ondemand/ondemand/ps-mysql1-2023-05-01-15:10:04-full Succeeded 21h 21h
ps-gr-sharding-test-2nd-of-may s3-ondemand s3://mt-bucket-backup-tl/ondemand/ondemand/ps-mysql1-2023-05-02-12:22:24-full Succeeded 27m 27m
</pre>
<p><span style="font-weight: 400;">To note that as DBA, we still need to validate the backups with a restore procedure, that part is not automated (yet). </span></p>
<p><i><span style="font-weight: 400;">*Note that Backup for POM PS is available only on demand given the solution is still in technical preview</span></i></p>
<h3><span style="font-weight: 400;">When will this solution fit in?</span></h3>
<p><span style="font-weight: 400;">As mentioned multiple times, this solution can cover simple cases of sharding, better if you have shared-nothing. </span></p>
<p><span style="font-weight: 400;">It also requires work from the DBA side in case of DDL operations or resharding. </span></p>
<p><span style="font-weight: 400;">You also need to be able to change some SQL code in order to be sure to have present the sharding key/information, in any SQL executed.</span></p>
<p> </p>
<h3><span style="font-weight: 400;">When will this solution not fit in?</span></h3>
<p><span style="font-weight: 400;">There are several things that could prevent you to use this solution, the most common ones are:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">You need to query multiple shards at the same time. This is not possible with ProxySQL</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">You do not have a DBA to perform administrative work and need to rely on automated system</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Distributed transaction cross shard</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">No access to SQL code.</span></li>
</ul>
<h2><span style="font-weight: 400;">Conclusions</span></h2>
<p><span style="font-weight: 400;">We do not have the Amletic dilemma about sharding or not sharding. </span></p>
<p><span style="font-weight: 400;">When using a RDBMS like MySQL, if you need horizontal scalability, you need to shard. </span></p>
<p><span style="font-weight: 400;">The point is there is no magic wand or solution, moving to sharding is an expensive and impacting operation. If you choose it at the beginning, before doing any application development, the effort can be significantly less. </span></p>
<p><span style="font-weight: 400;">Doing sooner will also allow you to test proper solutions, where </span><i><span style="font-weight: 400;">proper</span></i><span style="font-weight: 400;"> is a KISS solution, always go for the less complex things, in 2 years you will be super happy about your decision. </span></p>
<p><span style="font-weight: 400;">If instead you must convert a current solution, then prepare for a bloodshed, or at least for a long journey. </span></p>
<p><span style="font-weight: 400;">In any case we need to keep in mind few key points:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Do not believe most of the articles on the internet that promise you infinite scalability for your database. If there is no distinction in the article about a simple database and a RDBMS, run away. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Do not go for the last shiny things just because they shine. Test them and evaluate IF it makes sense for you. Better to spend a quarter testing now a few solutions, than fight for years with something that you do not fully comprehend. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Using containers/operators/kubernetes do not scale </span><i><span style="font-weight: 400;">per se</span></i><span style="font-weight: 400;">, you must find a mechanism to have the solution scaling, there is absolutely NO difference with premises. What you may get is a good level of automation, however that will come with a good level of complexity, it is up to you to evaluate if it makes sense or not. </span></li>
</ul>
<p> </p>
<p><span style="font-weight: 400;">As said at the beginning for MYSQL the choice is limited. Vitess is the full complete solution, with a lot of coding to provide you a complete platform to deal with your scaling needs.</span></p>
<p><span style="font-weight: 400;">However do not be so fast to exclude ProxySQL as possible solutions. There are out there already many using it also for sharding. </span></p>
<p><span style="font-weight: 400;">This small POC used a synthetic case, but it also shows that with just 4 rules you can achieve a decent solution. A real scenario could be a bit more complex … or not. </span></p>
<p> </p>
<h2><span style="font-weight: 400;">References</span></h2>
<p><span style="font-weight: 400;">Vitess (</span><a href="https://vitess.io/docs/"><span style="font-weight: 400;">https://vitess.io/docs/</span></a><span style="font-weight: 400;">)</span></p>
<p><span style="font-weight: 400;">ProxySQL (</span><a href="https://proxysql.com/documentation/"><span style="font-weight: 400;">https://proxysql.com/documentation/</span></a><span style="font-weight: 400;">)</span></p>
<p><span style="font-weight: 400;">Firewalling with ProxySQL (</span><a href="https://www.tusacentral.com/joomla/index.php/mysql-blogs/197-proxysql-firewalling"><span style="font-weight: 400;">https://www.tusacentral.com/joomla/index.php/mysql-blogs/197-proxysql-firewalling</span></a><span style="font-weight: 400;">)</span></p>
<p><span style="font-weight: 400;">Sharding:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><a href="https://www.percona.com/blog/mysql-sharding-with-proxysql/"><span style="font-weight: 400;">https://www.percona.com/blog/mysql-sharding-with-proxysql/</span></a></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://www.percona.com/blog/horizontal-scaling-in-mysql-sharding-followup/"><span style="font-weight: 400;">https://https://medium.com/pinterest-engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6fwww.percona.com/blog/horizontal-scaling-in-mysql-sharding-followup/</span></a></li>
<li style="font-weight: 400;" aria-level="1"><a href="https://quoraengineering.quora.com/MySQL-sharding-at-Quora"><span style="font-weight: 400;">https://quoraengineering.quora.com/MySQL-sharding-at-Quora</span></a></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">https://skywalking.apache.org/blog/skywalkings-new-storage-feature-based-on-shardingsphere-proxy-mysql-sharding/</span></li>
</ul>
<p> </p>]]></description>
<category>MySQL</category>
<pubDate>Thu, 11 May 2023 12:20:20 +0000</pubDate>
</item>
<item>
<title>Which is the best Proxy for Percona MySQL Operator?</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/252-comparisons-of-proxies-for-mysql</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/252-comparisons-of-proxies-for-mysql</guid>
<description><![CDATA[<h2>Overview</h2>
<p>HAProxy, ProxySQL, MySQL Router (AKA MySQL Proxy), in the last few years I had to answer multiple times on what proxy to use and in what scenario. When designing an architecture there are many components that need to be considered before deciding what is the best solution.</p>
<p>When deciding what to pick, there are many things to consider like where the proxy needs to be, if it “just” needs to redirect the connections or if more features need to be in, like caching, filtering, or if it needs to be integrated with some MySQL embedded automation.</p>
<p>Given that, there never was a single straight answer, instead an analysis needs to be done. Only after a better understanding of the environment, the needs and the evolution that the platform needs to achieve is it possible to decide what will be the better choice.</p>
<p>However recently we have seen an increase in usage of MySQL on Kubernetes, especially with the adoption of Percona Operator for MySQL.<br />In this case we have a quite well define scenario that can resemble the image below:</p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-default.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/proxys_gr_comparison-default.png" alt="proxys gr comparison default" width="536" height="614" class="aligncenter wp-image-86219 size-full" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">In this scenario the proxies need to sit inside Pods balancing the incoming traffic from the Service LoadBalancer connecting with the active data nodes.</span></p>
<p><span style="font-weight: 400;">Their role is merely to be sure that any incoming connection is redirected to nodes that are able to serve them, which include having a separation between Read/Write and Read Only traffic, separation that can be achieved, at service level, with automatic recognition or with two separate entry points. </span></p>
<p><span style="font-weight: 400;">In this scenario it is also crucial to be efficient in resource utilization and scaling with frugality. In this context features like filtering, firewalling or caching are redundant and may consume resources that could be allocated to scaling. Those are also features that will work better outside the K8s/Operator cluster, given the closer to the application they are located, the better they will serve. </span></p>
<p><span style="font-weight: 400;">About that we must always remember the concept that each K8s/Operator cluster needs to be seen as a single service, not as a real cluster. In short each cluster is in reality a single database with High Availability and other functionalities built in. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-default.png"></a></p>
<p><span style="font-weight: 400;">Anyhow, we are here to talk about Proxies. Once we have defined that we have one clear mandate in mind, we need to identify which product allow our K8s/Operator solution to:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Scale at the maximum the number of incoming connections</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Serve the request with the higher efficiency</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Consume less resources as possible</span></li>
</ul>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-default.png"></a></p>
<h2><span style="font-weight: 400;">The Environment</span></h2>
<p><span style="font-weight: 400;">To identify the above points I have simulated a possible K8s/Operator environment, creating:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">One powerful application node, where I run sysbench read only tests, scaling from 2 to 4096 threads. (Type c5.4xlarge)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">3 Mid data nodes with several gigabytes of data in with MySQL and Group Replication (Type m5.xlarge)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">1 Proxy node running on a resource limited box (Type t2.micro)</span></li>
</ul>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-default.png"></a></p>
<h2><span style="font-weight: 400;">The Tests </span></h2>
<p><span style="font-weight: 400;">We will have very simple test cases. The first one has the scope to define the baseline, identifying the moment when we will have the first level of saturation due to the number of connections. In this case we will increase the number of connections and keep a low number of operations. </span></p>
<p><span style="font-weight: 400;">The second test will define how well the increasing load is served inside the range we had previously identified. </span></p>
<p><span style="font-weight: 400;">For documentation the sysbench commands are:</span></p>
<p><span style="font-weight: 400;">Test1</span></p>
<pre class="lang:sh decode:true">sysbench ./src/lua/windmills/oltp_read.lua --db-driver=mysql --tables=200 --table_size=1000000
--rand-type=zipfian --rand-zipfian-exp=0 --skip_trx=true --report-interval=1 --mysql-ignore-errors=all
--mysql_storage_engine=innodb --auto_inc=off --histogram --stats_format=csv --db-ps-mode=disable --point-selects=50
--reconnect=10 --range-selects=true –rate=100 --threads=<#Threads from 2 to 4096> --time=1200 run
</pre>
<p><span style="font-weight: 400;">Test2</span></p>
<pre class="lang:sh decode:true">sysbench ./src/lua/windmills/oltp_read.lua --mysql-host=<host> --mysql-port=<port> --mysql-user=<user>
--mysql-password=<pw> --mysql-db=<schema> --db-driver=mysql --tables=200 --table_size=1000000 --rand-type=zipfian
--rand-zipfian-exp=0 --skip_trx=true --report-interval=1 --mysql-ignore-errors=all --mysql_storage_engine=innodb
--auto_inc=off --histogram --table_name=<tablename> --stats_format=csv --db-ps-mode=disable --point-selects=50
--reconnect=10 --range-selects=true --threads=<#Threads from 2 to 4096> --time=1200 run
</pre>
<h2><span style="font-weight: 400;">Results</span></h2>
<h3><span style="font-weight: 400;">Test 1</span></h3>
<p><span style="font-weight: 400;">As indicated here I was looking to identify when the first Proxy will reach a dimension that would not be manageable. The load is all in creating and serving the connections, while the number of operations is capped to 100. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/events_rate2.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/events_rate2.png" alt="events rate2" width="600" height="401" class="alignnone size-large wp-image-86207" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">As you can see and as I was expecting, the three Proxies were behaving more or less the same, serving the same number of operations (they were capped so why not), until not.</span></p>
<p><span style="font-weight: 400;">MySQL router after 2048 connection was not able to serve anything more.</span></p>
<p><b>NOTE: MySQL Router was actually stopped working at 1024 threads, but using version 8.0.32 I enabled the feature: </b><b><i>connection_sharing</i></b><b>. That allows it to go a bit further</b><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">Let us take a look also the the latency:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/latency95_rate.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/latency95_rate.png" alt="latency95 rate" width="600" height="424" class="alignnone size-large wp-image-86209" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">Here the situation starts to be a little bit more complicated. MySQL Router is the one that has the higher latency no matter what. However HAProxy and ProxySQL have an interesting behavior. HAProxy is performing better with a low number of connections, while ProxySQL is doing better when a high number of connections is in place. </span></p>
<p><span style="font-weight: 400;">This is due to the multiplexing and the very efficient way ProxySQL uses to deal with high load.</span></p>
<p><span style="font-weight: 400;">Everything has a cost:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/node-summary-cpu.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/node-summary-cpu.png" alt="node summary cpu" width="600" height="241" class="alignnone size-large wp-image-86215" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">HAProxy is definitely using less user’s CPU resources than ProxySQL or MySQL Router …</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/node-summary-cpu-saturation.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/node-summary-cpu-saturation.png" alt="node summary cpu saturation" width="600" height="238" class="alignnone size-large wp-image-86214" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">.. we can also notice that HAProxy barely reaches on average the 1.5 CPU load while ProxySQL is at 2.50 and MySQL Router around 2. </span></p>
<p><span style="font-weight: 400;">To be honest I was expecting something like this, given ProxySQL's need to handle the connections and the other basic routing. What was instead a surprise was MySQL Router, why does it have a higher load?</span></p>
<h4><span style="font-weight: 400;">Brief summary</span></h4>
<p><span style="font-weight: 400;">This test highlights that HAProxy and ProxySQL are able to reach a level of connection higher than the slowest runner in the game (MySQL Router). It is also clear that traffic is better served under a high number of connections by ProxySQL but it requires more resources. </span></p>
<h2><span style="font-weight: 400;">Test 2</span></h2>
<p><i><span style="font-weight: 400;">When the going gets tough, the tough gets going</span></i></p>
<p><span style="font-weight: 400;">Well let us remove the </span><i><span style="font-weight: 400;">–rate</span></i><span style="font-weight: 400;"> limitation and see what will happen. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/events_perf.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/events_perf.png" alt="events perf" width="800" height="475" class="alignnone size-large wp-image-86205" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">The scenario with load changes drastically. We can see how HAProxy is able to serve the connection and allow the execution of a higher number of operations for the whole test. ProxySQL is immediately after it and is behaving quite well up to 128 threads, then it just collapses. </span></p>
<p><span style="font-weight: 400;">MySQL Router never takes off; it always stays below the 1k reads/second while HAProxy was able to serve 8.2k and ProxySQL 6.6k.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/latency95_perf.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/latency95_perf.png" alt="latency95 perf" width="800" height="531" class="alignnone size-large wp-image-86208" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">Taking a look at the Latency, we can see that HAProxy had a gradual increase as expected, while ProxySQL and MySQL Router just went up from the 256 threads on. </span></p>
<p><span style="font-weight: 400;">To observe that both ProxySQL and MySQL Router were not able to complete the tests with 4096 threads.</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/node-summary-cpu-perf.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/node-summary-cpu-perf.png" alt="node summary cpu perf" width="800" height="318" class="alignnone size-large wp-image-86212" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">Why? HAProxy stays always below 50% cpu, no matter the increasing number of threads/connections, scaling the load very efficiently. MySQL router was almost immediately reaching the saturation point, being affected not only by the number of threads/connections but also by the number of operations, and that was not expected given we do not have a level 7 capability in MySQL Router.</span></p>
<p><span style="font-weight: 400;">Finally ProxySQL, which was working fine up to a certain limit, then it reached saturation point and was not able to serve the load. I am saying load because ProxySQL is a level 7 proxy and is aware of the content of the load. Given that on top of multiplexing, additional resource consumption was expected. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/node-summary-cpu-saturation-perf.png"><img src="http://www.tusacentral.net/joomla/images/stories/proxies_comparisons_march_2023/node-summary-cpu-saturation-perf.png" alt="node summary cpu saturation perf" width="800" height="319" class="alignnone size-large wp-image-86213" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">Here we just have a clear confirmation of what already said above, with 100% cpu utilization reached by MySQL Router with just 16 threads, and ProxySQL way after at 256 threads.</span></p>
<h2><span style="font-weight: 400;">Brief Summary</span></h2>
<p><span style="font-weight: 400;">HAProxy comes up as the champion in this test, there is no doubt that it was able to scale the increasing load in connection without being affected significantly by the load generated by the requests. The lower consumption in resources also indicates the possible space for even more scaling.</span></p>
<p><span style="font-weight: 400;">ProxySQL was penalized by the limited resources, but this was the game, we have to get the most out of the few available. This test indicates that it is not optimal to use ProxySQL inside the Operator, actually it is a wrong choice if low resource and scalability is a must. </span></p>
<p><span style="font-weight: 400;">MySQL Router was never in the game. Unless a serious refactoring, MySQL Router is designed for very limited scalability, as such the only way to adopt it is to have many of them at application node level. Utilizing it close to the data nodes in a centralized position is a mistake. </span></p>
<h2><span style="font-weight: 400;">Conclusions</span></h2>
<p><span style="font-weight: 400;">I started showing an image on how the MySQL service is organized and want to close showing the variation that for me is the one to be considered the default approach:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-haproxy.jpg"><img src="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-haproxy.jpg" alt="" width="536" height="651" class="size-full wp-image-86220 aligncenter" style="display: block; margin-left: auto; margin-right: auto;" /></a></p>
<p><span style="font-weight: 400;">This to highlight that we always need to choose the right tool for the job. </span></p>
<p><span style="font-weight: 400;">The Proxy in architectures involving MySQL/PS/PXC is a crucial element for the scalability of the cluster, no matter if using K8s or not. It is important to choose the one that serves us better, which in some cases can be ProxySQL over HAProxy. </span><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-haproxy.jpg"><br /></a></p>
<p><span style="font-weight: 400;">However when talking of K8s and Operators we must recognize the need to optimize the resources usage for the specific service. In that context there is no discussion about it, HAProxy is the best solution and the one we should go to. </span><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-haproxy.jpg"><br /></a></p>
<p><span style="font-weight: 400;">My final observation is about MySQL Router (aka MySQL Proxy). </span></p>
<p><span style="font-weight: 400;">Unless a significant refactoring of the product at the moment it is not even close to what the other two can do. From the tests done so far, it requires a complete reshaping starting to identify why it is so subject to the load coming from the query, more than the load coming from the connections. </span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2023/03/proxys_gr_comparison-haproxy.jpg"></a></p>
<p><span style="font-weight: 400;">Great MySQL to everyone. </span></p>
<h2>References</h2>
<p>https://www.percona.com/blog/boosting-percona-distribution-for-mysql-operator-efficiency/</p>
<p>https://www.slideshare.net/marcotusa/my-sql-on-kubernetes-demystified</p>
<p>https://docs.haproxy.org/2.7/configuration.html</p>
<p>https://proxysql.com/documentation/</p>
<p>https://dev.mysql.com/doc/mysql-router/8.0/en/</p>
<p> </p>
<p> </p>]]></description>
<category>MySQL</category>
<pubDate>Sun, 19 Mar 2023 17:57:59 +0000</pubDate>
</item>
<item>
<title>Help! I am out of disk space!</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/251-help-i-am-out-of-disk-space</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/251-help-i-am-out-of-disk-space</guid>
<description><![CDATA[<p><em><span style="font-weight: 400;">How we could fix a nasty out of space issue leveraging the flexibility of Percona MySQL operator (PMO) <img src="http://www.tusacentral.net/joomla/images/stories/DiskSpaceFull.png" alt="DiskSpaceFull" width="350" height="208" style="float: right;" /></span></em></p>
<p><span style="font-weight: 400;">When planning a database deployment, one of the most challenging factors to consider is the amount of space we need to dedicate for Data on disk.</span></p>
<p><span style="font-weight: 400;">This is even more cumbersome when working on bare metal. Given it is definitely more difficult to add space when using this kind of solution in respect to the cloud. </span></p>
<p><span style="font-weight: 400;">This is it, when using cloud storage like EBS or similar it is normally easy(er) to extend volumes, which gives us the luxury to plan the space to allocate for data with a good grade of relaxation. </span></p>
<p><span style="font-weight: 400;">Is this also true when using a solution based on Kubernetes like Percona Operator for MySQL? Well it depends on where you run it, however if the platform you choose supports the option to extend volumes K8s per se is giving you the possibility to do so as well.</span></p>
<p><span style="font-weight: 400;">However, if it can go wrong it will, and ending up with a fully filled device with MySQL is not a fun experience. </span></p>
<p><span style="font-weight: 400;">As you know, on normal deployments, when mysql has no space left on the device, it simply stops working, ergo it will cause a production down event, which of course is an unfortunate event that we want to avoid at any cost. </span></p>
<p><span style="font-weight: 400;">This blog is the story of what happened, what was supposed to happen and why. </span></p>
<h2><span style="font-weight: 400;">The story </span></h2>
<p><span style="font-weight: 400;">Case was on AWS using EKS.</span></p>
<p><span style="font-weight: 400;">Given all the above, I was quite surprised when we had a case in which a deployed solution based on PMO went out of space. However we start to dig and review what was going on and why.</span></p>
<p><span style="font-weight: 400;">The first thing we did was to quickly investigate what was really taking space, that could have been an easy win if most of the space was taken by some log, but unfortunately this was not the case, data was really taking all the available space. </span></p>
<p><span style="font-weight: 400;">The next step was to check what storage class was used for the PVC</span></p>
<pre class="lang:sh decode:true">k get pvc
NAME VOLUME CAPACITY ACCESS MODES STORAGECLASS
datadir-mt-cluster-1-pxc-0 pvc-<snip> 233Gi RWO io1
datadir-mt-cluster-1-pxc-1 pvc-<snip> 233Gi RWO io1
datadir-mt-cluster-1-pxc-2 pvc-<snip> 233Gi RWO io1
</pre>
<p><span style="font-weight: 400;">Ok we use the io1 SC, it is now time to check if the SC is supporting volume expansion:</span></p>
<pre class="lang:sh decode:true">kubectl describe sc io1
Name: io1
IsDefaultClass: No
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"},"name":"io1"},"parameters":{"fsType":"ext4","iopsPerGB":"12","type":"io1"},"provisioner":"kubernetes.io/aws-ebs"}
,storageclass.kubernetes.io/is-default-class=false
Provisioner: kubernetes.io/aws-ebs
Parameters: fsType=ext4,iopsPerGB=12,type=io1
AllowVolumeExpansion: <unset> <------------
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
</pre>
<p><span style="font-weight: 400;">And no is not enabled, in this case we cannot just go and expand the volume, must change the storage class settings first. </span><span style="font-weight: 400;"><br /></span><span style="font-weight: 400;">To enable volume expansion, you need to delete the storage class and enable it again. </span></p>
<p><span style="font-weight: 400;">Unfortunately we were unsuccessful in doing that operation, because the storage class kept staying as unset for ALLOWVOLUMEEXPANSION. </span></p>
<p><span style="font-weight: 400;">As said this is a production down event, so we cannot invest too much time in digging why it was not correctly changing the mode, we had to act quickly. </span></p>
<p><span style="font-weight: 400;">The only option we had to fix it was:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Expand the io1 volumes from AWS console (or aws client)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Resize the file system </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Patch any K8 file to allow K8 to correctly see the new volumes dimension </span></li>
</ul>
<p><span style="font-weight: 400;">Expanding EBS volumes from the console is trivial, just go to Volumes, select the volume you want to modify, choose modify and change the size of it with the one desired, done. </span></p>
<p><span style="font-weight: 400;">Once that is done connect to the Node hosting the pod which has the volume mounted like:</span></p>
<pre class="lang:sh decode:true"> k get pods -o wide|grep mysql-0
NAME READY STATUS RESTARTS AGE IP NODE
cluster-1-pxc-0 2/2 Running 1 11d 10.1.76.189 <mynode>.eu-central-1.compute.internal</pre>
<p><span style="font-weight: 400;">Then we need to get the id of the pvc to identify it on the node</span></p>
<pre class="lang:sh decode:true">k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS
datadir-cluster-1-pxc-0 Bound pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df 233Gi RWO io1
</pre>
<p><span style="font-weight: 400;">One note, when doing this kind of recovery with a PXC based solution, always recover node-0 first, then the others. </span></p>
<p><span style="font-weight: 400;">So we connect to <mynode> and identify the volume: </span></p>
<pre class="lang:sh decode:true">lslbk |grep pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df
nvme1n1 259:4 0 350G 0 disk /var/lib/kubelet/pods/9724a0f6-fb79-4e6b-be8d-b797062bf716/volumes/kubernetes.io~aws-ebs/pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df <-----
</pre>
<p><span style="font-weight: 400;">At this point we can resize it:</span></p>
<pre class="lang:sh decode:true">root@ip-<snip>:/# resize2fs /dev/nvme1n1
resize2fs 1.45.5 (07-Jan-2020)
Filesystem at /dev/nvme1n1 is mounted on /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-central-1a/vol-0ab0db8ecf0293b2f; on-line resizing required
old_desc_blocks = 30, new_desc_blocks = 44
The filesystem on /dev/nvme1n1 is now 91750400 (4k) blocks long.
</pre>
<p><span style="font-weight: 400;">The good thing is that as soon as you do that the MySQL daemon see the space and will restart, however it will happen only on the current pod and K8 will still see the old dimension:</span></p>
<pre class="lang:sh decode:true">k get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON
pvc-1678c7ee-3e50-4329-a5d8-25cd188dc0df 333Gi RWO Delete Bound pxc/datadir-cluster-1-pxc-0 io1
</pre>
<p><span style="font-weight: 400;">To allow k8 to be align with the real dimension we must patch the information stored, the command is the following:</span></p>
<pre class="lang:sh decode:true">kubectl patch pvc <pvc-name> -n <pvc-namespace> -p '{ "spec": { "resources": { "requests": { "storage": "NEW STORAGE VALUE" }}}}'
Ie:
kubectl patch pvc datadir-cluster-1-pxc-0 -n pxc -p '{ "spec": { "resources": { "requests": { "storage": "350" }}}}'
</pre>
<p><span style="font-weight: 400;">Remember to use as pvc-name the NAME coming from </span></p>
<pre><span style="font-weight: 400;">kubectl get pvc</span><span style="font-weight: 400;">.</span></pre>
<p><span style="font-weight: 400;">Once this is done k8 will see the new volume dimension correctly.</span></p>
<p><span style="font-weight: 400;">Just repeat the process for Node-1 and Node-2 and …done the cluster is up again.</span></p>
<p><span style="font-weight: 400;">Finally do not forget to modify your custom resources file (cr.yaml) to match the new volume size. IE:</span></p>
<pre class="lang:sh decode:true"> volumeSpec:
persistentVolumeClaim:
storageClassName: "io1"
resources:
requests:
storage: 350G
</pre>
<p><span style="font-weight: 400;">The whole process took just a few minutes, it was time now to investigate why the incident happened and why the storage class was not allowing extension in the first place. </span></p>
<p> </p>
<h2><span style="font-weight: 400;">Why it happened</span></h2>
<p><span style="font-weight: 400;">Well first and foremost the platform was not correctly monitored. As such there was lack of visibility about the space utilization, and no alert about disk space. </span></p>
<p><span style="font-weight: 400;">This was easy to solve just enabling the PMM feature in the cluster cr and set the alert in PMM once the nodes join it (see </span><a href="https://docs.percona.com/percona-monitoring-and-management/get-started/alerting.html"><span style="font-weight: 400;">https://docs.percona.com/percona-monitoring-and-management/get-started/alerting.html</span></a><span style="font-weight: 400;"> for details on how to).</span></p>
<p><span style="font-weight: 400;">The second issue was the problem with the storage class. Once we had the time to carefully review the configuration files, we identified that there was an additional tab in the SC class, which was causing k8 to ignore the directive. </span></p>
<p><span style="font-weight: 400;">Was suppose to be:</span></p>
<pre class="lang:sh decode:true">kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: io1
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "12"
fsType: ext4
allowVolumeExpansion: true <----------
It was:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: io1
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: kubernetes.io/aws-ebs
parameters:
type: io1
iopsPerGB: "12"
fsType: ext4
allowVolumeExpansion: true. <---------</pre>
<p><span style="font-weight: 400;">What was concerning was the lack of error returned by the Kubernetes API, so in theory the configuration was accepted but not really validated. </span></p>
<p><span style="font-weight: 400;">In any case once we had fix the typo and recreated the SC, the setting for volume expansion was correctly accepted:</span></p>
<pre class="lang:sh decode:true">kubectl describe sc io1
Name: io1
IsDefaultClass: No
Annotations: kubectl.kubernetes.io/last-applied-configuration={"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"},"name":"io1"},"parameters":{"fsType":"ext4","iopsPerGB":"12","type":"io1"},"provisioner":"kubernetes.io/aws-ebs"}
,storageclass.kubernetes.io/is-default-class=false
Provisioner: kubernetes.io/aws-ebs
Parameters: fsType=ext4,iopsPerGB=12,type=io1
AllowVolumeExpansion: True
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: Immediate
Events: <none>
</pre>
<h2><span style="font-weight: 400;">What should have happened instead?</span></h2>
<p><span style="font-weight: 400;">If a proper monitoring and alerting was in place, the administrators would have the time to act and extend the volumes without downtime. </span></p>
<p><span style="font-weight: 400;">However, the procedure for extending volumes on K8 is not complex but also not as straightforward as you may think. My colleague Natalia Marukovich wrote this blog post (</span><a href="https://www.percona.com/blog/percona-operator-volume-expansion-without-downtime/"><span style="font-weight: 400;">https://www.percona.com/blog/percona-operator-volume-expansion-without-downtime/</span></a><span style="font-weight: 400;">) that gives you the step by step instructions on how to extend the volumes without downtime. </span></p>
<h2><span style="font-weight: 400;">Conclusions</span></h2>
<p><span style="font-weight: 400;">Using the cloud, containers, automation or more complex orchestrators like Kubernetes, do not solve all, do not prevent mistakes from happening, and more importantly do not make the right decisions for you. </span></p>
<p><span style="font-weight: 400;">You must set up a proper architecture that includes backup, monitoring and alerting. You must set the right alerts and act on them in time. </span></p>
<p><span style="font-weight: 400;">Finally automation is cool, however the devil is in the details and typos are his day to day joy. Be careful and check what you put online, do not rush it. Validate, validate validate… </span></p>
<p><span style="font-weight: 400;">Great stateful MySQL to all. </span></p>]]></description>
<category>MySQL</category>
<pubDate>Thu, 19 Jan 2023 14:11:24 +0000</pubDate>
</item>
<item>
<title>MySQL Dual password how to manage them programmatically</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/250-mysql-dual-password-how-to-manage-them-programmatically</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/250-mysql-dual-password-how-to-manage-them-programmatically</guid>
<description><![CDATA[<p><span style="font-weight: 400;">What is dual password in MYSQL and how it works was already covered by my colleague Brian Sumpter here (</span><a href="https://www.percona.com/blog/using-mysql-8-dual-passwords/"><span style="font-weight: 400;">https://www.percona.com/blog/using-mysql-8-dual-passwords/</span></a><span style="font-weight: 400;">). <img src="http://www.tusacentral.net/joomla/images/stories/rpa-cognitive-blog-img.jpg" alt="rpa cognitive blog img" width="400" height="215" style="float: right;" /></span></p>
<p><span style="font-weight: 400;">However let me do a brief recap here about it.</span></p>
<p><span style="font-weight: 400;">Dual password is the MySQL mechanism that allows you to keep two passwords active at the same time. This feature is part of a more extended set of Password management features implemented in MySQL 8 to enforce better security and secrets management, like:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Internal Versus External Credentials Storage</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Password Expiration Policy</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Password Reuse Policy</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Password Verification-Required Policy</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Dual Password Support</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Random Password Generation</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Failed-Login Tracking and Temporary Account Locking</span></li>
</ul>
<p><span style="font-weight: 400;">The most important and requested features are the password expiration and verification policy. The problem in implementing them is the complexity of replacing passwords for accounts on very large platforms, like with thousands of applications and hundreds of MySQL servers. </span></p>
<p><span style="font-weight: 400;">In fact, while for a single user it is not so complex to change his own password when requested at login, for an application using thousands of sub-services it may require some time. The problem in performing the password change is that while executing the modification some services will have the updated password while others will still use the old one. Without Dual Password a segment of nodes will receive error messages in connecting creating service disruption. </span></p>
<p><span style="font-weight: 400;"><span style="font-weight: 400;">Now let us cover this blog topic. </span></span></p>
<p><span style="font-weight: 400;">With Dual Password it is instead possible to declare a new password keeping the old still active until the whole upgrade has been completed. </span></p>
<p><span style="font-weight: 400;">This highlight two very important aspects:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">When automating the password update, it is better to not use a password expiration policy, but base the expiration on the completion of the new password deployment.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We need to be sure the account we are changing the password to keeps the password active until we need it, and that is correctly removed when done. </span></li>
</ul>
<p><span style="font-weight: 400;">As you see I am focusing on the cases when we have automation and not the single interactive user update. </span></p>
<p><span style="font-weight: 400;">How dual password works</span></p>
<p><span style="font-weight: 400;">Let us assume we have create a user like:</span></p>
<pre class="lang:mysql decode:true">create user dualtest@'192.168.4.%' identified by 'password1';
grant all on test.* to dualtest@'192.168.4.%';
</pre>
<p><span style="font-weight: 400;">This will generate an entry in MySQL mysql.user table as:</span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>select user,host, plugin, authentication_string, password_last_changed,User_attributes from mysql.user where user = 'dualtest'\G
*************************** 1. row ***************************
user: dualtest
host: 192.168.4.%
plugin: mysql_native_password
authentication_string: *668425423DB5193AF921380129F465A6425216D0
password_last_changed: 2022-11-17 08:31:37
User_attributes: NULL
1 row in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">At this point our user will be able to connect from any application located in the correct network and act on the </span><i><span style="font-weight: 400;">test</span></i><span style="font-weight: 400;"> schema. </span></p>
<p><span style="font-weight: 400;">After some time, you as application owner,will be notified by your DBA team that the user </span><i><span style="font-weight: 400;">dualtest</span></i><span style="font-weight: 400;"> is required to change the password in order to respect the security constraints.</span></p>
<p><span style="font-weight: 400;">At this point there are two options:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">You have privileges to use Dual Password (the required dynamic privilege to use dual Password is </span><b>APPLICATION PASSWORD ADMIN</b><span style="font-weight: 400;">).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">You do not have the right privileges.</span></li>
</ol>
<p><span style="font-weight: 400;">In case 2 your DBA team must perform the change for you, and then they will let you know the new password.</span></p>
<p><span style="font-weight: 400;">In case 1 you can do the operation yourself. </span></p>
<p><span style="font-weight: 400;">In the last case what you will do is:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true crayon-selected">ALTER USER 'dualtest'@'192.168.4.%' IDENTIFIED BY 'password2' RETAIN CURRENT PASSWORD;
</pre>
<p><span style="font-weight: 400;">Then check it is done properly:</span></p>
<pre class="lang:mysql decode:true">select user,host, plugin, authentication_string, password_last_changed,User_attributes from mysql.user where user ='dualtest' order by 1,2\G
*************************** 1. row ***************************
user: dualtest
host: 192.168.4.%
plugin: mysql_native_password
authentication_string: *DC52755F3C09F5923046BD42AFA76BD1D80DF2E9
password_last_changed: 2022-11-17 08:46:28
User_attributes: {"additional_password": "*668425423DB5193AF921380129F465A6425216D0"}
1 row in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">As you can see here the OLD password has been moved to the </span><i><span style="font-weight: 400;">User_attributes</span></i><span style="font-weight: 400;"> JSON field that is used in MYSQL8 to store several values. </span></p>
<p><span style="font-weight: 400;">At this point you can rollout safely the password change and that change can take an hour or a week, no production impact given the applications will be able to use either of them. </span></p>
<p><span style="font-weight: 400;">Once the process is complete, you can ask your DBA team to remove OLD password, or do:</span></p>
<pre class="lang:mysql decode:true">ALTER USER 'dualtest'@'192.168.4.%' DISCARD OLD PASSWORD;</pre>
<p><span style="font-weight: 400;">Then check if the password has being removed properly:</span></p>
<pre class="lang:mysql decode:true">(root@localhost) [(none)]>select user,host, plugin, authentication_string, password_last_changed,User_attributes from mysql.user where user ='dualtest' order by 1,2\G
*************************** 1. row ***************************
user: dualtest
host: 192.168.4.%
plugin: mysql_native_password
authentication_string: *DC52755F3C09F5923046BD42AFA76BD1D80DF2E9
password_last_changed: 2022-11-17 08:46:28
User_attributes: NULL
1 row in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">If all is clean the process can be considered complete. </span></p>
<p><span style="font-weight: 400;">Of course all this should be automated and executed by code not by hand high level it should be more or less like this:</span></p>
<pre class="lang:sh decode:true">{}Input new password
- Check for additional_password in User_attributes in mysql.user
<> If no value you can proceed otherwise exit (another change is in place)
- Read and store authentication_string for the user you need to change password
- Change current password with: Alter user ... RETAIN CURRENT PASSWORD
- Check for additional_password in User_attributes in mysql.user
<> If value is present and match the password stored then you can proceed otherwise exit given there is an error in Dual Password or the passwords are different
- Run update on all application nodes, and verify new password on each application node
<> At regular interval check the number of completed changes and check the additional_password in User_attributes in mysql.user to be sure it is still there
[] When all application nodes are up to date
<> If verification is successful 100%
- Remove OLD password with: ALTER USER ... DISCARD OLD PASSWORD;
- Check for additional_password in User_attributes in mysql.user
<> If no value is present close with OK otherwise report Error for password not removed
() complete
</pre>
<h2><span style="font-weight: 400;">Conclusion</span></h2>
<p><span style="font-weight: 400;">As also Brian mentioned, those are the small things that could make the difference when in large deployments and enterprise environments. Security is a topic that very often is underestimated in small companies or start-ups, but that is wrong, security operations like password rotation are crucial for your safety. </span></p>
<p><span style="font-weight: 400;">It is nice to see that MySQL is finally adopting simple but effective steps to help DBAs to implement proper procedures without causing production impact and without the need to become too creative. </span></p>
<h2> </h2>
<h2><span style="font-weight: 400;">References </span></h2>
<p><a href="https://www.percona.com/blog/using-mysql-8-dual-passwords/"><span style="font-weight: 400;">https://www.percona.com/blog/using-mysql-8-dual-passwords/</span></a></p>
<p><a href="https://dev.mysql.com/doc/refman/8.0/en/password-management.html#dual-passwords"><span style="font-weight: 400;">https://dev.mysql.com/doc/refman/8.0/en/password-management.html#dual-passwords</span></a></p>]]></description>
<category>MySQL</category>
<pubDate>Thu, 17 Nov 2022 16:26:45 +0000</pubDate>
</item>
<item>
<title>ProxySQL support for MySQL caching_sha2_password</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/249-proxysql-support-for-mysql-caching-sha2-password</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/249-proxysql-support-for-mysql-caching-sha2-password</guid>
<description><![CDATA[<p><span style="font-weight: 400;">In our time, every day we use dozens if not hundreds of applications connecting to some kind of data repository. This simple step is normally executed over the network and given so, it is subject to possible sniffing with all the possible related consequences.<img src="http://www.tusacentral.net/joomla/images/stories/proxysql/brokenlock.png" alt="brokenlock" style="float: right;" /><br style="clear: left;" /></span></p>
<p><span style="font-weight: 400;">Given that it is normally better to protect your connection using data encryption like SSL, or at the minimum, make the information you pass to connect less easy to be intercepted. </span></p>
<p><span style="font-weight: 400;">At the same time it is best practice to not store connection credential in clear text, not even inside a table in your database. Doing that is the equivalent of writing your password over a sticky note on your desk. Not a good idea.</span></p>
<p><span style="font-weight: 400;">The main options are instead in either transforming the passwords to be less identifiable like hashing or to store the information in an external centralized vault. </span></p>
<p><span style="font-weight: 400;">In MySQL the passwords are transformed in order to not be clear text, and several different plugins are used to authenticate the user. From version 8 MySQL uses </span><i><span style="font-weight: 400;">caching_sha2_password</span></i><span style="font-weight: 400;"> as default authentication plugin. The </span><i><span style="font-weight: 400;">caching_sha2_password</span></i><span style="font-weight: 400;"> and </span><i><span style="font-weight: 400;">sha256_password</span></i><span style="font-weight: 400;"> authentication plugins provide more secure password encryption than the </span><i><span style="font-weight: 400;">mysql_native_password</span></i><span style="font-weight: 400;"> plugin, and </span><i><span style="font-weight: 400;">caching_sha2_password</span></i><span style="font-weight: 400;"> provides better performance than </span><i><span style="font-weight: 400;">sha256_password</span></i><span style="font-weight: 400;">. Due to these superior security and performance characteristics of </span><i><span style="font-weight: 400;">caching_sha2_password</span></i><span style="font-weight: 400;">, it is as of MySQL 8.0 the preferred authentication plugin, and is also the default authentication plugin rather than mysql_native_password.</span></p>
<p><span style="font-weight: 400;">In this regard recently I got the same question again “Can we use ProxySQL with MySQL 8 authorization mechanism?”, and I decided it was time to write this short blog post.</span></p>
<p><span style="font-weight: 400;">The short answer is “Yes you can”, however do not expect to have full </span><i><span style="font-weight: 400;">caching_sha2_password</span></i><span style="font-weight: 400;"> support.</span></p>
<p><span style="font-weight: 400;">This is because ProxySQL does not fully support the </span><i><span style="font-weight: 400;">caching_sha2_password </span></i><span style="font-weight: 400;">mechanism internally and given that a “trick” must be used. </span></p>
<p><span style="font-weight: 400;">So, what should we do when using MySQL 8 and ProxySQL? </span></p>
<p><span style="font-weight: 400;">In the text below we will see what can be done to continue to use ProxySQL with MySQL and Percona server 8. </span></p>
<p><span style="font-weight: 400;">Note that I have used the Percona </span><i><span style="font-weight: 400;">proxysql_admin</span></i><span style="font-weight: 400;"> tool to manage the users except in the last case. <br />Percona </span><i><span style="font-weight: 400;">proxysql_admin</span></i><span style="font-weight: 400;"> tool is a nice tool that helps you to manage ProxySQL and in regard to user it also manage and synchronize users from your Percona or MySQL </span></p>
<p><span style="font-weight: 400;">In the following examples:</span></p>
<p><span style="font-weight: 400;">Proxysql is on 192.168.4.191</span></p>
<p><span style="font-weight: 400;">User name/password is msandbox/msandbox</span></p>
<h2><span style="font-weight: 400;">Using hashing.</span></h2>
<p><span style="font-weight: 400;">By default MySQL comes with </span><i><span style="font-weight: 400;">caching_sha2_password</span></i><span style="font-weight: 400;"> as such if I create a user names </span><i><span style="font-weight: 400;">msandbox</span></i><span style="font-weight: 400;"> I will have:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">DC1-1(root@localhost) [(none)]>select user,host, authentication_string,plugin from mysql.user order by 1,2;
+----------------------------+--------------------+------------------------------------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+----------------------------+--------------------+------------------------------------------------------------------------+-----------------------+
| msandbox | % | $A$005$Z[z@l'O%[Q5t^ EKJDgxjWXJjDpDEUv91oL7Hoh/0NydTeCzpV.aI06C9. | caching_sha2_password | <---- this user
+----------------------------+--------------------+------------------------------------------------------------------------+-----------------------+
</pre>
<p><span style="font-weight: 400;">Then I use percona_scheduler_admin to sync the users:</span></p>
<pre class="lang:mysql decode:true">./percona-scheduler-admin --config-file=config.toml --syncusers
Syncing user accounts from PXC(192.168.4.205:3306) to ProxySQL
Removing existing user from ProxySQL: msandbox
Adding user to ProxySQL: msandbox
Synced PXC users to the ProxySQL database!
mysql> select * from mysql_users ;
+------------+------------------------------------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+-----------------------------+
| username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | attributes | comment |
+------------+------------------------------------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+-----------------------------+
| msandbox | $A$005$Z[z@l'O%[Q5t^ EKJDgxjWXJjDpDEUv91oL7Hoh/0NydTeCzpV.aI06C9 | 1 | 0 | 100 | NULL | 0 | 1 | 0 | 1 | 1 | 10000 | | |
+------------+------------------------------------------------------------------------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+-----------------------------+</pre>
<p><span style="font-weight: 400;">And set the query rules:</span></p>
<pre class="lang:mysql decode:true">insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1048,6033,'msandbox',100,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1050,6033,'msandbox',101,1,3,'^SELECT.*$',1);
load mysql query rules to run;save mysql query rules to disk;
</pre>
<p><span style="font-weight: 400;">Now I try to connect passing by ProxySQL:</span></p>
<pre class="lang:sh decode:true"># mysql -h 192.168.4.191 -P6033 -umsandbox -pmsandbox
ERROR 1045 (28000): ProxySQL Error: Access denied for user 'msandbox'@'192.168.4.191' (using password: YES)
</pre>
<p><span style="font-weight: 400;">My account will fail to connect given failed authentication.</span></p>
<p><span style="font-weight: 400;">To fix this I need to drop the user and recreate it with a different authentication plugin in my MySQL server:</span></p>
<pre class="lang:mysql decode:true">drop user msandbox@'%';
create user 'msandbox'@'%' identified with mysql_native_password BY 'msandbox';
grant select on *.* to 'msandbox'@'%';
select user,host, authentication_string,plugin from mysql.user order by 1,2;
+----------+--------------------+-------------------------------------------+-----------------------+
| user | host | authentication_string | plugin |
+----------+--------------------+-------------------------------------------+-----------------------+
| msandbox | % | *6C387FC3893DBA1E3BA155E74754DA6682D04747 | mysql_native_password |
+----------+--------------------+-------------------------------------------+-----------------------+
</pre>
<p><span style="font-weight: 400;">At this point I can re-run </span></p>
<pre class="lang:sh decode:true">./percona-scheduler-admin --config-file=config.toml --syncusers</pre>
<p><span style="font-weight: 400;">if I try to connect again:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true"># mysql -h 192.168.4.191 -P6033 -umsandbox -pmsandbox
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 6708563
Server version: 8.0.28 (ProxySQL). <---------------------------- Connecting to proxysql
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show global variables like 'version%';
+-------------------------+------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------+------------------------------------------------------------------------------------+
| version | 8.0.25-15.1 <--- Percona/MySQL version |
| version_comment | Percona XtraDB Cluster binary (GPL) 8.0.25, Revision 8638bb0, WSREP version 26.4.3 |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
| version_compile_zlib | 1.2.11 |
| version_suffix | .1 |
+-------------------------+------------------------------------------------------------------------------------+
6 rows in set (0.02 sec)</pre>
<p><span style="font-weight: 400;">This is the only way to keep the password hashed in MySQL and in ProxySQL.</span></p>
<h2><span style="font-weight: 400;">Not using Hashing</span></h2>
<p><span style="font-weight: 400;">What if you cannot use mysql_native_password for the password in your MySQL server?</span></p>
<p><span style="font-weight: 400;">There is a way to still connect, however I do not recommend it given for me is highly insecure, but for completeness I am going to illustrate it.</span></p>
<p><span style="font-weight: 400;">First of all disable password hashing in Proxysql:</span></p>
<pre class="lang:mysql decode:true">update global_variables set Variable_Value='false' where Variable_name='admin-hash_passwords'; </pre>
<p><span style="font-weight: 400;">At this point instead sync the user you can locally create the user like:</span></p>
<pre class="lang:mysql decode:true">insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('msandbox','msandbox',1,100,'mysql',1,'generic test for security');
mysql> select * from runtime_mysql_users where username ='msandbox';
+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+---------------------------+
| username | password | active | use_ssl | default_hostgroup | default_schema | schema_locked | transaction_persistent | fast_forward | backend | frontend | max_connections | attributes | comment |
+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+---------------------------+
| msandbox | msandbox | 1 | 0 | 100 | mysql | 0 | 1 | 0 | 1 | 0 | 10000 | | generic test for security |
| msandbox | msandbox | 1 | 0 | 100 | mysql | 0 | 1 | 0 | 0 | 1 | 10000 | | generic test for security |
+----------+----------+--------+---------+-------------------+----------------+---------------+------------------------+--------------+---------+----------+-----------------+------------+---------------------------+</pre>
<p><span style="font-weight: 400;">As you can see doing that will prevent the password to be hashed and instead it will be clear text.</span><span style="font-weight: 400;"></span></p>
<p><span style="font-weight: 400;">At this point you will be able to connect to MySQL 8 using the caching_sha2_password, but the password is visible in ProxySQL.</span></p>
<p><span style="font-weight: 400;">Let me repeat, I DO NOT recommend using it this way, because for me it is highly insecure. </span></p>
<p> </p>
<h1><span style="font-weight: 400;">Conclusion</span></h1>
<p><span style="font-weight: 400;">While it is still possible to configure your user in MySQL to connect using ProxySQL, it is obvious that we have a gap in the way ProxySQL supports security. </span></p>
<p><span style="font-weight: 400;">The hope is that this gap will be filled soon by the ProxySQL development team, also if looking to the past issues this seems pending from years now. </span></p>
<h1><span style="font-weight: 400;">References</span></h1>
<p class="p1"><span class="s1"><a href="https://proxysql.com/documentation/mysql-8-0/">https://proxysql.com/documentation/mysql-8-0/</a></span></p>
<p class="p1"><span class="s1"><a href="https://github.com/sysown/proxysql/issues/2580">https://github.com/sysown/proxysql/issues/2580</a></span></p>
<p class="p2"><span class="s2">https://www.percona.com/blog/upgrade-your-libraries-authentication-plugin-caching_sha2_password-cannot-be-loaded/</span></p>]]></description>
<category>MySQL</category>
<pubDate>Thu, 03 Nov 2022 13:47:19 +0000</pubDate>
</item>
<item>
<title>Zero impact on index creation with Aurora 3</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/248-zero-impact-on-index-creation-with-aurora-3</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/248-zero-impact-on-index-creation-with-aurora-3</guid>
<description><![CDATA[<p><span style="font-weight: 400;"><span style="font-weight: 400;"><img src="http://www.tusacentral.net/joomla/images/stories/ddl/aurora_ddl_notes.jpeg" alt="aurora ddl notes" style="float: right;" /></span>Last quarter of 2021 AWS released Aurora version 3. This new version aligns Aurora with the latest MySQL 8 version porting many of the advantages MySQL 8 has over previous versions.</span></p>
<p><span style="font-weight: 400;">While this brings a lot of new interesting features for Aurora, what we are going to cover here is to see how DDLs behave when using the ONLINE option. With a quick comparison with what happens in MySQL 8 standard and with Group Replication. </span></p>
<h2><span style="font-weight: 400;">Tests</span></h2>
<p><span style="font-weight: 400;">All tests were run on an Aurora instance r6g.large with secondary availability zone.<br /></span><span style="font-weight: 400;">The test was composed by:</span></p>
<p><span style="font-weight: 400;"> 4 connections</span></p>
<ul>
<li style="list-style-type: none;">
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">#1 to perform ddl</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">#2 to perform insert data in the table I am altering</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">#3 to perform insert data on a different table </span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">#4 checking the other node operations</span></li>
</ul>
</li>
</ul>
<p><span style="font-weight: 400;">In the Aurora instance, a sysbench schema with 10 tables and 5 million rows was created, just to get a bit of traffic. While the test table with 5ml rows as well was:</span><span style="font-weight: 400;"><br /></span></p>
<pre class="lang:mysql decode:true">CREATE TABLE `windmills_test` (
`id` bigint NOT NULL AUTO_INCREMENT,
`uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`millid` smallint NOT NULL,
`kwatts_s` int NOT NULL,
`date` date NOT NULL,
`location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`active` tinyint NOT NULL DEFAULT '1',
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_millid` (`millid`,`active`),
KEY `IDX_active` (`id`,`active`),
KEY `kuuid_x` (`uuid`),
KEY `millid_x` (`millid`),
KEY `active_x` (`active`),
KEY `idx_1` (`uuid`,`active`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC
</pre>
<p><span style="font-weight: 400;">The executed commands:</span></p>
<pre class="lang:sh decode:true">Connection 1:
ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
Connection 2:
while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
Connection 3:
while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills3 select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
Connections 4:
while [ 1 = 1 ];do echo "$(date +'%T.%3N')";mysql --defaults-file=./my.cnf -h <secondary aurora instance> -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done
</pre>
<p>Operations:<br />1) start inserts from connections<br />2) start commands in connections 4 - 5 on the other nodes<br />3) execute: <span style="background-color: #f4f4f4; font-family: 'Courier 10 Pitch', Courier, monospace; font-size: 12.8px;">DC1-1(root@localhost) [windmills_large]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;</span></p>
<p><span style="font-weight: 400;">With this, what I was looking to capture is the operation impact in doing a common action as creating an Index. My desired expectation is to have no impact when doing operations that are declared “ONLINE” such as creating an index, as well as data consistency between nodes. </span></p>
<p><span style="font-weight: 400;">Let us see what happened…</span></p>
<h2><span style="font-weight: 400;">Results</span></h2>
<p><span style="font-weight: 400;">While running the insert in the same table, performing the alter:</span></p>
<pre class="lang:mysql decode:true">mysql> ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
Query OK, 0 rows affected (16.51 sec)
Records: 0 Duplicates: 0 Warnings: 0
</pre>
<p><span style="font-weight: 400;">Is NOT stopping the operation in the same table or any other table in the Aurora instance.</span></p>
<p><span style="font-weight: 400;">We can only identify a minimal performance impact:</span></p>
<pre class="lang:sh decode:true">[root@ip-10-0-0-11 tmp]# while [ 1 = 1 ];do da=$(date +'%s.%3N');mysql --defaults-file=./my.cnf -D windmills_large -e "insert into windmills_test select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills4 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
.347
.283
.278
.297
.291
.317
.686 ← start
<Snip>
.512 ← end
.278
.284
.279
</pre>
<p><span style="font-weight: 400;">The secondary node is not affected at all, and this is because Aurora managed at storage level the data replication. Given that there is no such thing as Apply from Relaylog, as we have in standard MySQL asynchronous or data replicated with Group Replication. </span></p>
<p><span style="font-weight: 400;">The result is that in Aurora 3, we can have zero impact index (or any other ONLINE/INSTANT) operation, with this I include the data replicated in the other instances for High Availability. </span></p>
<p><span style="font-weight: 400;">If we compare this with Group replication (see <a href="https://www.percona.com/blog/online-ddl-with-group-replication-in-mysql-8-0-27/">blog</a>):</span></p>
<pre class="lang:sh decode:true"> GR Aurora 3
Time on hold for insert for altering table ~0.217 sec ~0.523 sec
Time on hold for insert for another table ~0.211 sec ~0.205 sec
</pre>
<p><span style="font-weight: 400;">However, keep in mind that MySQL with Group Replication will still need to apply the data on the Secondaries. This means that if your alter was taking 10 hours to build the index, the Secondary nodes will be misaligned with the Source for approximately another 10 hours. </span></p>
<p><span style="font-weight: 400;">With Aurora 3 or with PXC, changes will be there when Source has completed the operation. </span></p>
<p><span style="font-weight: 400;">What about Percona XtraDB Cluster (PXC)? Well, with PXC we have a different scenario:</span></p>
<pre class="lang:sh decode:true"> PXC(NBO) Aurora 3
Time on hold for insert for altering table ~120 sec ~0.523 sec
Time on hold for insert for another table ~25 sec ~0.205 sec</pre>
<p><span style="font-weight: 400;">We will have a higher impact while doing the Alter operation, but the data will be on all nodes at the same time maintaining a high level of consistency in the cluster. </span></p>
<h2><span style="font-weight: 400;">Conclusion</span></h2>
<p><span style="font-weight: 400;">Aurora is not for all use, and not for all budgets, however it has some very good aspects like the one we have just seen. The Difference between standard MySQL and Aurora is not in the time of holding/locking (aka operation impact), but on the HA aspects. If I have my data/structure on all my Secondary at the same time of the Source, I will feel much more comfortable, than having to wait an additional T time. </span></p>
<p><span style="font-weight: 400;">This is why PXC in that case is a better alternative if you can afford locking time, if not, well Aurora 3 is your solution, just do your math properly and be conservative with the instance resources. </span></p>
<p> </p>]]></description>
<category>MySQL</category>
<pubDate>Wed, 20 Apr 2022 12:05:56 +0000</pubDate>
</item>
<item>
<title>A face to face with semi-synchronous replication</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/247-a-face-to-face-with-semi-synchronous-replication</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/247-a-face-to-face-with-semi-synchronous-replication</guid>
<description><![CDATA[<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/face_to_face.jpeg"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/face_to_face.jpeg" alt="" width="300" height="212" class="size-medium wp-image-80671 alignright" style="float: right;" /></a></p>
<p><span style="font-weight: 400;">Last month I performed a review of the Percona Operator for MySQL Server (<a href="https://www.percona.com/doc/kubernetes-operator-for-mysql/ps/index.html">https://www.percona.com/doc/kubernetes-operator-for-mysql/ps/index.html</a>) which is still Alpha. That operator is based on Percona Server and uses standard asynchronous replication, with the option to activate semi-synchronous replication to gain higher levels of data consistency between nodes. </span></p>
<p><span style="font-weight: 400;">The whole solution is composed as:</span></p>
<p><a href="https://www.percona.com/blog/wp-content/uploads/2022/04/operator.svg"><img src="https://www.percona.com/blog/wp-content/uploads/2022/04/operator.svg" alt="" width="499" height="409" class="alignnone wp-image-80663" /></a></p>
<p><span style="font-weight: 400;">Additionally, Orchestrator (</span><a href="https://github.com/openark/orchestrator"><span style="font-weight: 400;">https://github.com/openark/orchestrator</span></a><span style="font-weight: 400;">) is used to manage the topology and the settings to enable on the replica nodes, the semi-synchronous flag if required.<br /></span><span style="font-weight: 400;">While we have not too much to say when using standard Asynchronous replication, I want to spend two words on the needs and expectations on the semi-synchronous (semi-sync) solution. </span></p>
<h2><span style="font-weight: 400;">A look into semi-synchronous</span></h2>
<p><span style="font-weight: 400;">Difference between Async and Semi-sync.<br /></span>Asynchronous:</p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/async-replication-diagram.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/async-replication-diagram.png" alt="" width="725" height="304" class="alignnone size-full wp-image-80662" /></a></p>
<p><span style="font-weight: 400;">The above diagram represents the standard asynchronous replication. This method is expected by design, to have transactions committed on the Source that are not present on the Replicas. The Replica is supposed to </span><i><span style="font-weight: 400;">catch-up</span></i><span style="font-weight: 400;"> when possible. </span></p>
<p><span style="font-weight: 400;">It is also important to understand that there are two steps in replication:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Data copy, which is normally very fast. The Data is copied from the binlog of the Source to the relay log on the Replica (IO_Thread).</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Data apply, where the data is read from the relay log on the Replica node and written inside the database itself (SQL_Thread). This step is normally the bottleneck and while there are some parameters to tune, the efficiency to apply transactions depends on many factors including schema design. </span></li>
</ul>
<p><span style="font-weight: 400;">Production deployments that utilize the Asynchronous solution are typically designed to manage the possible inconsistent scenario given data on Source is not supposed to be on Replica at commit. At the same time the level of High Availability assigned to this solution is lower than the one we normally obtain with (virtually-)synchronous replication, given we may need to wait for the Replicare to catch-up the gap accumulated in the relay-logs before performing the fail-over.</span></p>
<p>Semi-sync:</p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/semisync-replication-diagram.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/semisync-replication-diagram.png" alt="" width="725" height="396" class="alignnone size-full wp-image-80665" /></a></p>
<p><span style="font-weight: 400;">The above diagram represents the Semi-sync replication method.The introduction of semi-sync adds a checking step on the Source before it returns the acknowledgement to the client.<br /></span><span style="font-weight: 400;">This step happens at the moment of the data-copy, so when the data is copied from the Binary-log on Source to the Relay-log on Replica. </span></p>
<p><span style="font-weight: 400;">This is important, there is NO mechanism to ensure a more resilient or efficient data replication, there is only an additional step, that tells the Source to wait a given amount of time for an answer from N replicas, and then return the acknowledgement or timeout and return to the client no matter what. </span></p>
<p><span style="font-weight: 400;">This mechanism is introducing a </span><i><span style="font-weight: 400;">possible</span></i><span style="font-weight: 400;"> significant delay in the service, without giving the 100% guarantee of data consistency. </span></p>
<p><span style="font-weight: 400;">In terms of </span><i><span style="font-weight: 400;">availability</span></i><span style="font-weight: 400;"> of the service, when in presence of high load, this method may lead the Source to stop serving the request while waiting for acknowledgements, significantly reducing the availability of the service itself. </span></p>
<p><span style="font-weight: 400;">At the same time only acceptable settings for rpl_semi_sync_source_wait_point is AFTER_SYNC (default) because: </span><i><span style="font-weight: 400;">In the event of source failure, all transactions committed on the source have been replicated to the replica (saved to its relay log). An unexpected exit of the source server and failover to the replica is lossless because the replica is up to date.</span></i></p>
<p><span style="font-weight: 400;">All clear? No? Let me simplify the thing. </span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">In standard replication you have two moments (I am simplifying)</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Copy data from Source to Replica</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Apply data in the Replica node</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">There is no certification on the data applied about its consistency with the Source</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">With asynchronous the Source task is to write data in the binlog and forget</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">With semi-sync the Source writes the data on binlog and waits </span><b><i>T</i></b><span style="font-weight: 400;"> seconds to receive </span><i><span style="font-weight: 400;">acknowledgement</span></i><span style="font-weight: 400;"> from </span><b><i>N</i></b><span style="font-weight: 400;"> servers about them having received the data.</span></li>
</ul>
<p><span style="font-weight: 400;">To enable semi-sync you follow these steps: </span><a href="https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html"><span style="font-weight: 400;">https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html</span></a><span style="font-weight: 400;"></span></p>
<p><span style="font-weight: 400;">In short:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Register the plugins</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Enable Source rpl_semi_sync_source_enabled=1</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Enable Replica rpl_semi_sync_replica_enabled = 1</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">If replication is already running STOP/START REPLICA IO_THREAD</span></li>
</ul>
</li>
</ul>
<p><span style="font-weight: 400;">And here starts the fun, be ready for many “</span><i><span style="font-weight: 400;">wait whaaat?”</span></i><span style="font-weight: 400;">. </span></p>
<p><span style="font-weight: 400;">What is the T and N I have just mentioned above?</span></p>
<p><span style="font-weight: 400;">Well the T is a timeout that you can set to avoid having the source wait forever for the Replica acknowledgement. The default is 10 seconds. What happens if the Source waits for more than the timeout? <i><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/wait_whaat.jpeg" alt="" width="115" height="79" class="wp-image-80666 alignright" style="float: right;" /></i><br /></span><b><i>rpl_semi_sync_source_timeout</i></b><i><span style="font-weight: 400;"> controls how long the source waits on a commit for acknowledgment from a replica before timing out and </span></i><b><i>reverting to asynchronous replication</i></b><i><span style="font-weight: 400;">.</span></i></p>
<p><span style="font-weight: 400;">Careful of the wording here! The manual says </span><b>SOURCE</b><span style="font-weight: 400;">, so it is not that MySQL revert to asynchronous, by transaction or connection, </span><span style="font-weight: 400;">it is for the <strong>whole server</strong>.</span></p>
<p><span style="font-weight: 400;">Now analyzing the work-log (see </span><a href="https://dev.mysql.com/worklog/task/?id=1720"><span style="font-weight: 400;">https://dev.mysql.com/worklog/task/?id=1720</span></a><span style="font-weight: 400;"> and more in the references) the Source should revert to semi-synchronous as soon as all involved replicas are aligned again. </span></p>
<p><span style="font-weight: 400;">However, checking the code (see </span><a href="https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L844"><span style="font-weight: 400;">https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L844</span></a><span style="font-weight: 400;"> and following), we can see that we do not have a 100% guarantee that the Source will be able to switch back. </span></p>
<p><span style="font-weight: 400;">Also in the code:</span><span style="font-weight: 400;"><br /></span><i><span style="font-weight: 400;">But, it is not that easy to detect that the replica has caught up. This is caused by the fact that MySQL's replication protocol is asynchronous, meaning that if thesource does not use the semi-sync protocol, the replica would not send anything to thesource.</span></i></p>
<p><span style="font-weight: 400;">In all the runned tests the Source was not able to switch back. In short Source was moving out from semi-sync and that was </span><span style="font-weight: 400;">forever</span><span style="font-weight: 400;">, no rollback. Keep in mind that while we go ahead.</span></p>
<p><span style="font-weight: 400;">What is the N I mentioned above? It represents the number of Replicas that must provide the acknowledgement back. </span></p>
<p><span style="font-weight: 400;">If you have a cluster of 10 nodes you may need to have only 2 of them involved in the semi-sync, no need to include them all. But if you have a cluster of 3 nodes where 1 is the Source, relying on 1 Replica only, is not really secure. What I mean here is that if you choose to be semi-synchronous to ensure the data replicates, having it enabled for one single node is not enough, if that node crashes or whatever, you are doomed, as such you need at least 2 nodes with semi-sync.</span></p>
<p><span style="font-weight: 400;">Anyhow, the point is that if one of the Replica takes more than T to reply, the whole mechanism stops working, probably forever. </span></p>
<p><span style="font-weight: 400;">As we have seen above, to enable semi-sync on Source we manipulate the value of the GLOBAL variable </span><i><span style="font-weight: 400;">rpl_semi_sync_source_enabled</span></i><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">However if I check the value of </span><i><span style="font-weight: 400;">rpl_semi_sync_source_enabled</span></i><span style="font-weight: 400;"> when the Source shift to simple Asynchronous replication because timeout:</span></p>
<p><span style="font-weight: 400;">select @@rpl_semi_sync_source_enabled;</span></p>
<pre class="lang:mysql decode:true">select @@rpl_semi_sync_source_enabled;
+--------------------------------+
| @@rpl_semi_sync_source_enabled |
+--------------------------------+
| 1 |
+--------------------------------+
</pre>
<p><span style="font-weight: 400;">As you can see the Global variable reports a value of 1, meaning that semi-sync is active also if not.</span></p>
<p><span style="font-weight: 400;">In the documentation it is reported that to monitor the semi-sync activity we should check for Rpl_semi_sync_source_status. Which means that you can have <a href="https://www.percona.com/blog/wp-content/uploads/2022/04/wait_whaat.jpeg"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/wait_whaat.jpeg" alt="" width="113" height="77" class="wp-image-80666 alignright" style="float: right;" /></a> Rpl_semi_sync_source_status = 0 and rpl_semi_sync_source_enabled =1 at the same time.</span></p>
<p><span style="font-weight: 400;">Is this a bug? Well according to documentation:</span><span style="font-weight: 400;"><br /></span><i><span style="font-weight: 400;">When the source switches between asynchronous or semisynchronous replication due to commit-blocking timeout or a replica catching up, it sets the value of the Rpl_semi_sync_source_status or Rpl_semi_sync_source_status status variable appropriately. Automatic fallback from semisynchronous to asynchronous replication on the source means that it is possible for the rpl_semi_sync_source_enabled or rpl_semi_sync_source_enabled system variable to have a value of 1 on the source side even when semisynchronous replication is in fact not operational at the moment. You can monitor the Rpl_semi_sync_source_status or Rpl_semi_sync_source_status status variable to determine whether the source currently is using asynchronous or semisynchronous replication.</span></i></p>
<p><b>It is not a bug</b><span style="font-weight: 400;">. However, because you documented it, it doesn’t change the fact this is a weird/unfriendly/counterintuitive way of doing, that opens the door to many, many possible issues. Especially given you know the Source may fail to switch semi-synch back. </span></p>
<p><span style="font-weight: 400;">Just to close this part, we can summarize as follows:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">You activate semi-sync setting a global variable</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Server/Source can disable it (silently) without changing that variable </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Server will never restore semi-sync automatically</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The way to check if semi-sync works is to use the Status variable</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">When Rpl_semi_sync_source_status = 0 and rpl_semi_sync_source_enabled =1 you had a Timeout and Source is now working in asynchronous replication</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The way to reactivate semi-sync is to set rpl_semi_sync_source_enabled to OFF first then rpl_semi_sync_source_enabled = ON. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Replicas can be set with semi-sync ON/OFF but unless you do not STOP/START the replica_IO_THREAD the state of the variable can be inconsistent with the state of the Server.</span></li>
</ul>
<p><span style="font-weight: 400;">What can go wrong?</span></p>
<h2><span style="font-weight: 400;">Semi-synchronous is not seriously affecting the performance</span></h2>
<p><span style="font-weight: 400;">Others had already discussed semi-sync performance in better details. However I want to add some color given the recent experience with our operator testing.<br /></span><span style="font-weight: 400;">In the next graphs I will show you the behavior of writes/reads using Asynchronous replication and the same load with Semi-synchronous.<br /></span><span style="font-weight: 400;">For the record the test was a simple Sysbench-tpcc test using 20 tables, 20 warehouses, 256 threads for 600 seconds. </span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/2.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/2.png" alt="" width="900" height="401" class="alignnone wp-image-80652 size-large" /></a> <a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/3.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/3.png" alt="" width="900" height="421" class="alignnone size-large wp-image-80653" /></a></p>
<p><span style="font-weight: 400;">The one above indicates a nice and consistent set of load in r/w with minimal fluctuations. This is what we like to have. </span></p>
<p><span style="font-weight: 400;">The graphs below, represent the exact same load on the exact same environment but with semi-sync activated and no timeout. </span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/9.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/9.png" alt="" width="900" height="421" class="alignnone wp-image-80657 size-large" /></a> <a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/10.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/10.png" alt="" width="900" height="420" class="alignnone wp-image-80658 size-large" /></a></p>
<p><span style="font-weight: 400;">Aside from the performance loss (we went from Transaction 10k/s to 3k/s), the constant stop/go imposed by the semi-sync mechanism has a very bad effect on the application behavior when you have many concurrent threads and high loads. I challenge any serious production system to work in this way. </span></p>
<p><span style="font-weight: 400;">Of course results are inline with this yoyo game:</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/1c.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/1c.png" alt="" width="900" height="356" class="alignnone size-large wp-image-80651" /></a></p>
<p><span style="font-weight: 400;">In the best case, when all was working as expected, and no crazy stuff happening I had something around the 60% loss. I am not oriented to see this as a minor performance drop. </span></p>
<h2><span style="font-weight: 400;">But at least your data is safe</span></h2>
<p><span style="font-weight: 400;">As already stated at the beginning the scope of semi-synchronous replication is to guarantee that the data in server A reaches server B before returning the OK to the application. </span></p>
<p><span style="font-weight: 400;">In short, given a period of 1 second we should have minimal transactions in flight and limited transactions in the apply queue. While for standard replication (asynchronous), we may have … thousands. </span></p>
<p><span style="font-weight: 400;">In the graphs below we can see two lines:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The yellow line represents the number of GTIDs “in flight” from Source to destination, Y2 axes. In case of Source crash, those transactions are lost and we will have data loss.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">The blue line represents the number of GTIDs already copied over from Source to Replica but not applied in the database Y1 axes. In case of Source crash we must wait for the Replica to process these entries, before making the node Write active, or we will have data inconsistency.</span></li>
</ul>
<p><span style="font-weight: 400;">Asynchronous replication:</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/4.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/4.png" alt="" width="900" height="526" class="alignnone wp-image-80654 size-large" /></a></p>
<p><span style="font-weight: 400;">As expected we can see a huge queue in applying the transactions from relay-log, and some spike of transactions in flight. </span></p>
<p><span style="font-weight: 400;">Using Semi-synchronous replication:</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/8.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/8.png" alt="" width="900" height="421" class="alignnone size-large wp-image-80656" /></a></p>
<p><span style="font-weight: 400;">Yes, apparently we have reduced the queue and no spikes so no data loss.</span></p>
<p><span style="font-weight: 400;">But this happens when all goes as expected, and we know in production this is not the normal.<br /></span><span style="font-weight: 400;">What if we need to enforce the semi-sync but at the same time we cannot set the Timeout to ridiculous values like 1 week? </span></p>
<p><span style="font-weight: 400;">Simple, we need to have a check that puts back the semi-sync as soon as it is silently disabled (for any reason).<br /></span><span style="font-weight: 400;">However doing this without waiting for the Replicas to cover the replication gap, cause the following interesting effects:</span></p>
<p><a href="http://www.tusacentral.net/joomla/images/stories/semi-sync/5.png"><img src="http://www.tusacentral.net/joomla/images/stories/semi-sync/5.png" alt="" width="900" height="525" class="alignnone size-large wp-image-80655" /></a></p>
<p><span style="font-weight: 400;">Thousands of transactions queued and shipped with the result of having a significant increase of the possible data loss and still a huge number of data to apply from the relay-log. </span></p>
<p><span style="font-weight: 400;">So the only possible alternative is to set the Timeout to a crazy value, However this can cause a full production stop in the case a Replica hangs or for any reason it disables the semi-sync locally. </span></p>
<p> </p>
<h2><span style="font-weight: 400;">Conclusion</span></h2>
<p><span style="font-weight: 400;">First of all I want to say that the tests on our operator using Asynchronous replication, shows a consistent behavior with the standard deployments in the cloud or premises. It has the same benefits, like better performance and same possible issues as longer time to failover when it needs to wait a Replica to apply the relay-log queue. </span></p>
<p><span style="font-weight: 400;">The semi-synchronous flag in the operator is disabled, and the tests I have done bring me to say “keep it like that!”. At least unless you know very well what you are doing and are able to deal with a semi-sync timeout of days.</span></p>
<p>I was happy to have the chance to perform these tests, because they gives me a way/time/need to investigate more on the semi-synchronous feature.<br /><span style="font-weight: 400;">Personally, I was not convinced about the semi-synchronous replication when it came out, and I am not now. I never saw a less consistent and less trustable feature in MySQL as semi-sync. </span></p>
<p><span style="font-weight: 400;">If you need to have a higher level of synchronicity in your database just go for Group Replication, or Percona XtraDB Cluster and stay away from semi-sync. </span></p>
<p><span style="font-weight: 400;">Otherwise, stay on Asynchronous replication, which is not perfect but it is predictable. </span></p>
<h2><span style="font-weight: 400;">References</span></h2>
<p><a href="https://www.percona.com/blog/2012/01/19/how-does-semisynchronous-mysql-replication-work/"><span style="font-weight: 400;">https://www.percona.com/blog/2012/01/19/how-does-semisynchronous-mysql-replication-work/</span></a></p>
<p><a href="https://www.percona.com/blog/percona-monitoring-and-management-mysql-semi-sync-summary-dashboard/"><span style="font-weight: 400;">https://www.percona.com/blog/percona-monitoring-and-management-mysql-semi-sync-summary-dashboard/</span></a></p>
<p><a href="https://www.percona.com/blog/2012/06/14/comparing-percona-xtradb-cluster-with-semi-sync-replication-cross-wan/">https://www.percona.com/blog/2012/06/14/comparing-percona-xtradb-cluster-with-semi-sync-replication-cross-wan/</a></p>
<p><a href="https://datto.engineering/post/lossless-mysql-semi-sync-replication-and-automated-failover"><span style="font-weight: 400;">https://datto.engineering/post/lossless-mysql-semi-sync-replication-and-automated-failover</span></a></p>
<p><a href="https://planetscale.com/blog/mysql-semi-sync-replication-durability-consistency-and-split-brains"><span style="font-weight: 400;">https://planetscale.com/blog/mysql-semi-sync-replication-durability-consistency-and-split-brains</span></a></p>
<p><a href="https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/"><span style="font-weight: 400;">https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/</span></a></p>
<p><a href="https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html"><span style="font-weight: 400;">https://dev.mysql.com/doc/refman/8.0/en/replication-semisync-installation.html</span></a></p>
<p><a href="https://dev.mysql.com/worklog/task/?id=1720"><span style="font-weight: 400;">https://dev.mysql.com/worklog/task/?id=1720</span></a></p>
<p><a href="https://dev.mysql.com/worklog/task/?id=6630"><span style="font-weight: 400;">https://dev.mysql.com/worklog/task/?id=6630</span></a></p>
<p><a href="https://dev.mysql.com/worklog/task/?id=4398"><span style="font-weight: 400;">https://dev.mysql.com/worklog/task/?id=4398</span></a></p>
<p><a href="https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L844"><span style="font-weight: 400;">https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L844</span></a></p>
<p><a href="https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L881"><span style="font-weight: 400;">https://github.com/mysql/mysql-server/blob/beb865a960b9a8a16cf999c323e46c5b0c67f21f/plugin/semisync/semisync_source.cc#L881</span></a></p>]]></description>
<category>MySQL</category>
<pubDate>Tue, 12 Apr 2022 10:00:01 +0000</pubDate>
</item>
<item>
<title>Online DDL with Group Replication In MySQL 8.0.27</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/246-online-ddl-with-group-replication-in-mysql-8-0-27</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/246-online-ddl-with-group-replication-in-mysql-8-0-27</guid>
<description><![CDATA[<p><span style="font-weight: 400;">Last April 2021 I wrote an <a href="https://www.percona.com/blog/2021/04/15/online-ddl-with-group-replication-in-percona-server-for-mysql-8-0-22/">article</a> about Online DDL and Group Replication. At that time we were dealing with MySQL 8.0.23 </span><span style="font-weight: 400;">and also opened a <a href="https://bugs.mysql.com/bug.php?id=103421">bug</a> report </span><span style="font-weight: 400;">which did not have the right answer to the case presented. </span></p>
<p><span style="font-weight: 400;">Anyhow, in that article I have shown how an online DDL was de facto locking the whole cluster for a very long time even when using the consistency level set to EVENTUAL.</span></p>
<p><span style="font-weight: 400;">This article is to give justice to the work done by the MySQL/Oracle engineers to correct that annoying inconvenience. </span></p>
<p><span style="font-weight: 400;">Before going ahead, let us remember how an Online DDL was propagated in a group replication cluster, and identify the differences with what happens now, all with the consistency level set to EVENTUAL (<a href="https://dev.mysql.com/doc/refman/8.0/en/group-replication-configuring-consistency-guarantees.html">see</a></span><span style="font-weight: 400;">).</span></p>
<p><span style="font-weight: 400;">In MySQL 8.0.23 we were having:</span></p>
<table>
<tbody>
<tr>
<td><a href="https://www.percona.com/blog/wp-content/uploads/2022/01/1-gr-ddl.png"><img src="http://www.tusacentral.net/joomla/images/stories/gr_ddl_8_0_27/1-gr-ddl.png" alt="1 gr ddl" width="201" height="222" class="alignnone size-full wp-image-79683" /></a></td>
<td><a href="https://www.percona.com/blog/wp-content/uploads/2022/01/gr-ddl-2-old.png"><img src="http://www.tusacentral.net/joomla/images/stories/gr_ddl_8_0_27/gr-ddl-2-old.png" alt="gr ddl 2 old" width="201" height="222" class="alignnone size-full wp-image-79689" /></a></td>
<td><a href="https://www.percona.com/blog/wp-content/uploads/2022/01/gr-ddl-3-old.png"><img src="http://www.tusacentral.net/joomla/images/stories/gr_ddl_8_0_27/gr-ddl-3-old.png" alt="gr ddl 3 old" width="211" height="242" class="alignnone size-full wp-image-79684" /></a></td>
</tr>
</tbody>
</table>
<p><span style="font-weight: 400;">While in MySQL 8.0.27 we have:</span></p>
<table>
<tbody>
<tr>
<td><a href="https://www.percona.com/blog/wp-content/uploads/2022/01/1-gr-ddl.png"><img src="http://www.tusacentral.net/joomla/images/stories/gr_ddl_8_0_27/1-gr-ddl-new.png" alt="1 gr ddl new" width="201" height="222" class="alignnone size-full wp-image-79683" /></a></td>
<td><a href="https://www.percona.com/blog/wp-content/uploads/2022/01/gr-ddl-2-new.png"><img src="http://www.tusacentral.net/joomla/images/stories/gr_ddl_8_0_27/gr-ddl-2-new.png" alt="gr ddl 2 new" width="231" height="257" class="alignnone size-full wp-image-79686" /></a></td>
<td><a href="https://www.percona.com/blog/wp-content/uploads/2022/01/gr-ddl-3-new.png"><img src="http://www.tusacentral.net/joomla/images/stories/gr_ddl_8_0_27/gr-ddl-3-new.png" alt="gr ddl 3 new" width="201" height="212" class="alignnone size-full wp-image-79687" /></a></td>
</tr>
</tbody>
</table>
<p> </p>
<p><span style="font-weight: 400;">As you can see from the images we have 3 different phases. Phase 1 is the same between version 8.0.23 and version 8.0.27. </span></p>
<p><span style="font-weight: 400;">Phase 2 and 3 instead are quite different. In MySQL 8.0.23 after the DDL is applied on the Primary it is propagated to the other nodes, but a metalock was also acquired and the control was NOT returned. The result was that not only the session executing the DDL was kept on hold, but also all the other sessions performing modifications. </span></p>
<p><span style="font-weight: 400;">Only when the operation was over on all secondaries, the DDL was pushed to Binlog and disseminated for Asynchronous replication, lock raised and operation can restart.</span></p>
<p><span style="font-weight: 400;">Instead, in MySQL 8.0.27, once the operation is over on the primary the DDL is pushed to binlog, disseminated to the secondaries and control returned. The result is that the write operations on primary have no interruption whatsoever and the DDL is distributed to secondary and Asynchronous replication at the same time. </span></p>
<p><span style="font-weight: 400;">This is a fantastic improvement, available only with consistency level EVENTUAL, but still, fantastic.</span></p>
<h3><span style="font-weight: 400;">Let's see some numbers.</span></h3>
<p><span style="font-weight: 400;">To test the operation, I have used the same approach used in the previous tests in the article mentioned above.</span></p>
<pre class="lang:sh decode:true">Connection 1:
ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=NONE;
ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
Connection 2:
while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmills_test select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
Connection 3:
while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "insert into windmill8 select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmill7 limit 1;" -e "select count(*) from windmills_large.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done
Connections 4-5:
while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/mysql-8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_large -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_large)"|grep -i -v localhost;sleep 1;done
</pre>
<p><span style="font-weight: 400;">Modifying a table with ~5 million rows:</span></p>
<pre class="lang:mysql decode:true">node1-DC1 (root@localhost) [windmills_large]>select count(*) from windmills_test;
+----------+
| count(*) |
+----------+
| 5002909 |
+----------+
</pre>
<p><span style="font-weight: 400;">The numbers below represent the time second.milliseconds taken by the operation to complete. While I was also catching the state of the ALTER on the other node I am not reporting it here given it is not relevant. </span></p>
<pre class="lang:sh decode:true">EVENTUAL (on the primary only)
-------------------
Node 1 same table:
.184
.186 <--- no locking during alter on the same node
.184
<snip>
.184
.217 <--- moment of commit
.186
.186
.186
.185
Node 1 another table :
.189
.198 <--- no locking during alter on the same node
.188
<snip>
.191
.211 <--- moment of commit
.194
</pre>
<p><span style="font-weight: 400;">As you can see there is just a very small delay at the moment of commit but other impact.</span></p>
<p><span style="font-weight: 400;">Now if we compare this with the recent tests I have done for PXC Non Blocking operation (see <a href="https://www.percona.com/blog/percona-xtradb-cluster-non-blocking-operation-for-online-schema-upgrade/">here</a></span><span style="font-weight: 400;">) with same number of rows and same kind of table/data:</span></p>
<table border="1">
<thead>
<tr><th>Action</th><th align="right">Group Replication</th><th align="right">PXC (NBO)</th></tr>
</thead>
<tbody>
<tr>
<td>Time on hold for insert in altering table</td>
<td align="right">~ 0.217 sec</td>
<td align="right">~ 120 sec</td>
</tr>
<tr>
<td>Time on hold for insert in another table</td>
<td align="right">~ 0.211 sec</td>
<td align="right">~ 25 sec</td>
</tr>
</tbody>
</table>
<p> </p>
<p><span style="font-weight: 400;"><strong>However</strong>, yes there is a <strong>however</strong>, PXC was maintaining consistency between the different nodes during the DDL execution, while MySQL 8.0.27 with Group Replication was postponing consistency on the secondaries, thus Primary and Secondary were not in sync until full DDL finalization on the secondaries.</span></p>
<h1><span style="font-weight: 400;">Conclusions</span></h1>
<p><span style="font-weight: 400;">MySQL 8.0.27 comes with this nice fix that significantly reduces the impact of an online DDL operation on a busy server. But we can still observe a significant misalignment of the data between the nodes when a DDL is executing. </span></p>
<p><span style="font-weight: 400;">On the other hand PXC with NBO is a bit more “expensive” in time, but nodes remain aligned all the time.</span></p>
<p><span style="font-weight: 400;">At the end is what is more important for you to choose one or the other solution, consistency vs. operational impact.</span></p>
<p><span style="font-weight: 400;">Great MySQL to all.</span></p>]]></description>
<category>MySQL</category>
<pubDate>Tue, 11 Jan 2022 13:04:00 +0000</pubDate>
</item>
<item>
<title>A look into Percona XtraDB Cluster Non Blocking Operation for Online Schema Upgrade</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/245-a-look-into-percona-xtradb-cluster-non-blocking-operation-for-online-schema-upgrade</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/245-a-look-into-percona-xtradb-cluster-non-blocking-operation-for-online-schema-upgrade</guid>
<description><![CDATA[<p><span style="font-weight: 400;">Percona XtraDB Cluster 8.0.25 has introduced a new option to perform online schema modifications: NBO (<a href="https://www.percona.com/doc/percona-xtradb-cluster/LATEST/features/nbo.html#nbo">Non Blocking Operation</a>).</span></p>
<p><span style="font-weight: 400;">When using PXC the cluster relies on </span><b>wsrep_OSU_method</b><span style="font-weight: 400;"> parameter to define the Online Schema Upgrade (OSU) method the node uses to replicate DDL statements. <img src="http://www.tusacentral.net/joomla/images/stories/NBO/breaking_bariers.jpg" alt="breaking bariers" width="400" height="239" style="float: right;" /></span></p>
<p><span style="font-weight: 400;">Until now we normally have 3 options:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Use Total Isolation Order (TOI, the default)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Use Rolling Schema Upgrade (RSU)</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Use Percona’s online schema change tool (TOI + <a href="https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html">PTOSC</a>)</span></li>
</ul>
<p><span style="font-weight: 400;">Each method has some positive and negative aspects. TOI will lock the whole cluster from being able to accept data modifications for the entire time it takes to perform the DDL operation. RSU will misalign the schema definition between the nodes, and in any case the node performing the DDL operation is still locked. Finally TOI+PTOSC will rely on creating triggers and copying data, so in some cases this can be very impactful. </span></p>
<p><span style="font-weight: 400;">The new Non Blocking Operation (NBO) method is to help to reduce the impact on the cluster and make it easier to perform some DDL operations.</span></p>
<p><span style="font-weight: 400;">At the moment we only support a limited set of operations with NBO like:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">ALTER INDEX</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">CREATE INDEX</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">DROP INDEX</span></li>
</ul>
<p><span style="font-weight: 400;">Any other command will result in an error message ER_NOT_SUPPORTED_YET.</span></p>
<p><span style="font-weight: 400;">But let us see how it works and what is the impact while we will also compare it with the default method TOI.</span></p>
<p><span style="font-weight: 400;">What we will do is working with 4 connections:</span></p>
<p style="padding-left: 40px;"><span style="font-weight: 400;">1 to perform ddl<br /></span><span style="font-weight: 400;">2 to perform insert data in the table being altered<br /></span><span style="font-weight: 400;">3 to perform insert data on a different table <br /></span><span style="font-weight: 400;">4-5 checking the other two nodes operations</span></p>
<p><span style="font-weight: 400;">PXC must be at least Ver 8.0.25-15.1.</span></p>
<p><span style="font-weight: 400;">The table we will modify is :</span></p>
<pre class="lang:mysql decode:true">DC1-1(root@localhost) [windmills_s]>show create table windmills_test\G
*************************** 1. row ***************************
Table: windmills_test
Create Table: CREATE TABLE `windmills_test` (
`id` bigint NOT NULL AUTO_INCREMENT,
`uuid` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`millid` smallint NOT NULL,
`kwatts_s` int NOT NULL,
`date` date NOT NULL,
`location` varchar(50) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`active` tinyint NOT NULL DEFAULT '1',
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`strrecordtype` char(3) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_millid` (`millid`,`active`),
KEY `IDX_active` (`id`,`active`),
KEY `kuuid_x` (`uuid`),
KEY `millid_x` (`millid`),
KEY `active_x` (`active`)
) ENGINE=InnoDB AUTO_INCREMENT=8199260 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_bin ROW_FORMAT=DYNAMIC
1 row in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">And contains ~5 million rows.</span></p>
<pre class="lang:sh decode:true">DC1-1(root@localhost) [windmills_s]>select count(*) from windmills_test;
+----------+
| count(*) |
+----------+
| 5002909 |
+----------+
1 row in set (0.44 sec)
</pre>
<p><span style="font-weight: 400;">The commands.<br /></span><span style="font-weight: 400;">Connection 1:</span></p>
<pre class="lang:sh decode:true"> ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
ALTER TABLE windmills_test drop INDEX idx_1, ALGORITHM=INPLACE;
</pre>
<p> </p>
<p><span style="font-weight: 400;">Connection 2:</span></p>
<pre class="lang:sh decode:true">while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills_test select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done</pre>
<p> </p>
<p><span style="font-weight: 400;">Connection 3:</span></p>
<pre class="lang:sh decode:true"> while [ 1 = 1 ];do da=$(date +'%s.%3N');/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "insert into windmills8 select null,uuid,millid,kwatts_s,date,location,active,time,strrecordtype from windmills7 limit 1;" -e "select count(*) from windmills_s.windmills_test;" > /dev/null;db=$(date +'%s.%3N'); echo "$(echo "($db - $da)"|bc)";sleep 1;done</pre>
<p> </p>
<p><span style="font-weight: 400;">Connections 4-5:</span></p>
<pre class="lang:sh decode:true">while [ 1 = 1 ];do echo "$(date +'%T.%3N')";/opt/mysql_templates/PXC8P/bin/mysql --defaults-file=./my.cnf -uroot -D windmills_s -e "show full processlist;"|egrep -i -e "(windmills_test|windmills_s)"|grep -i -v localhost;sleep 1;done</pre>
<p><span style="font-weight: 400;">Operations:</span></p>
<ul>
<li><span style="font-weight: 400;">start inserts from connections</span></li>
<li><span style="font-weight: 400;">start commands in connections 4 - 5 on the other nodes</span></li>
<li><span style="font-weight: 400;">execute: </span>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">for TOI </span>
<ul>
<li aria-level="2">
<pre class="">DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=TOI;</pre>
</li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">for NBO</span>
<ul>
<li aria-level="2">
<pre class="">DC1-1(root@localhost) [windmills_s]>SET SESSION wsrep_OSU_method=NBO;</pre>
</li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">For both</span>
<ul>
<li>
<pre class="">DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE, LOCK=shared;</pre>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h1> </h1>
<h1 style="padding-left: 40px;"><span style="font-weight: 400;">Let us run it</span></h1>
<h2><span style="font-weight: 400;">Altering a table with TOI.</span></h2>
<pre class="lang:sh decode:true">DC1-1(root@localhost) [windmills_s]>ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE;
Query OK, 0 rows affected (1 min 4.74 sec)
Records: 0 Duplicates: 0 Warnings: 0
</pre>
<p> </p>
<p><span style="font-weight: 400;">Inserts in the altering table (connection 2):</span></p>
<pre class="lang:sh decode:true">.450
.492
64.993 <--- Alter blocks all inserts on the table we are altering
.788
.609
</pre>
<p> </p>
<p><span style="font-weight: 400;">Inserts on the other table (connection 3):</span></p>
<pre class="lang:sh decode:true">.455
.461
64.161 <--- Alter blocks all inserts on all the other tables as well
.641
.483
</pre>
<p> </p>
<p><span style="font-weight: 400;">On the other nodes at the same time of the ALTER we can see:</span></p>
<pre class="lang:sh decode:true">Id User db Command Time State Info Time_ms Rows_sent Rows_examined
15 system user windmills_s Query 102 altering table ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE 102238 0 0 <--- time from start
</pre>
<p><span style="font-weight: 400;">So in short we have the whole cluster locked for ~64 seconds. During this period of time, all the operations to modify data or structure were on hold. </span></p>
<h2> </h2>
<h2><span style="font-weight: 400;">Let us now try with NBO</span></h2>
<p><span style="font-weight: 400;">Inserts in the altering table:</span></p>
<pre class="lang:sh decode:true">.437
.487
120.758 <---- Execution time increase
.617
.510
</pre>
<p> </p>
<p><span style="font-weight: 400;">Inserts on the other table:</span></p>
<pre class="lang:sh decode:true">.468
.485
25.061 <---- still a metalock, but not locking the other tables for the whole duration
.494
.471
</pre>
<p> </p>
<p><span style="font-weight: 400;">On the other nodes at the same time of the ALTER we can see:</span></p>
<pre class="lang:sh decode:true">Id User db Command Time State Info Time_ms Rows_sent Rows_examined
110068 system user windmills_s Connect 86 altering table ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE 120420 0 0
</pre>
<p> </p>
<p><span style="font-weight: 400;">In this case what is also interesting to note is that:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">We have a moment of metalock:</span><ol>
<li aria-level="2">
<pre class="">110174 pmm 127.0.0.1:42728 NULL Query 2 Waiting for table metadata lock SELECT x FROM information_schema.tables WHERE TABLE_SCHEMA = 'windmills_s' 1486 10 0</pre>
</li>
<li aria-level="2">
<pre class="">110068 system user connecting host windmills_s Connect 111 closing tables ALTER TABLE windmills_test ADD INDEX idx_1 (`uuid`,`active`), ALGORITHM=INPLACE 111000 0 0</pre>
</li>
</ol></li>
<li><span style="font-weight: 400;">The execution time is longer </span></li>
</ol>
<p><span style="font-weight: 400;">Summarizing:</span></p>
<pre class="lang:sh decode:true"> TOI NBO
Time on hold for insert for altering table ~64 sec ~120 sec
Time on hold for insert for another table ~64 sec ~25 sec
metalock whole time only at the end
</pre>
<h2> </h2>
<h2><span style="font-weight: 400;">What is happening, what are the differences and why takes longer with NBO?</span></h2>
<p><span style="font-weight: 400;">Let see at very high level how the two works:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">TOI: when you issue a DDL like ADD INDEX a metadata lock is taken on the table and it will be released only at the end of the operation. During this time, you cannot: </span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Perform DMLs on any cluster node</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">Alter another table in the cluster</span></li>
</ul>
</li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">NBO: the metadata lock is taken at the start and at the end for a very brief period of time. The ADD INDEX operation will then work on each node independently. The lock taken at the end is to have all the nodes agree on the operation and commit or roll back (using cluster error voting). This final phase costs a bit more in time and is what adds a few seconds to the operation execution. But during the operation:</span>
<ul>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">You can alter another table (using NBO)</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">You can continue to insert data, except in the table(s) you are altering.</span></li>
<li style="font-weight: 400;" aria-level="2"><span style="font-weight: 400;">On node crash the operation will continue on the other nodes, and if successful it will persist. </span></li>
</ul>
</li>
</ul>
<p><span style="font-weight: 400;">In short the cluster server behavior changes significantly when using NBO, offering significant flexibility compared to TOI. The cost in time should not linearly increase with the dimension of the table, but more in relation to the single node efficiency in performing the ALTER operation. </span></p>
<h1><span style="font-weight: 400;">Conclusion</span></h1>
<p><span style="font-weight: 400;">NBO can be significantly helpful to reduce the impact of DDL on the cluster, for now limited to the widely used creation/modification/drop of an index. But in the future … we may expand it. </span></p>
<p><span style="font-weight: 400;">The feature is still a technology preview, so do not trust in production, but test it and let us know what you think. </span></p>
<p><span style="font-weight: 400;">Final comment. Another distribution has introduced NBO, but only if you buy the enterprise version.</span></p>
<p><span style="font-weight: 400;">Percona, which is truly open source with facts not just words, has implemented NBO in standard PXC, and the code is fully open source. This is not the first one, but just another of the many features Percona is offering for free, while others ask you to buy the enterprise version.</span></p>
<p><span style="font-weight: 400;">Enjoy the product and let us have your feedback!</span></p>
<p><span style="font-weight: 400;">Great MySQL to all! </span></p>
<p> </p>]]></description>
<category>MySQL</category>
<pubDate>Fri, 10 Dec 2021 10:00:29 +0000</pubDate>
</item>
<item>
<title>What if … MySQL’s repeatable reads cause you to lose money?</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/244-what-if-mysql-s-repeatable-reads-cause-you-to-lose-money</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/244-what-if-mysql-s-repeatable-reads-cause-you-to-lose-money</guid>
<description><![CDATA[<p><span style="font-weight: 400;">Well, let me say if that happens because there is a logic mistake in your application. But you need to know and understand what happens in MySQL to be able to avoid the problem. </span></p>
<p><span style="font-weight: 400;">In short the WHY of this article is to inform you about possible pitfalls and how to prevent that to cause you damage. <img src="http://www.tusacentral.net/joomla/images/stories/repeatable_read/pitfalls1.jpg" alt="pitfalls1" style="float: right;" /></span></p>
<p><span style="font-weight: 400;">Let us start by having a short introduction to what Repeatable reads are about. Given I am extremely lazy, I am going to use (a lot) existing documentation from <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html">MySQL documentation.</a></span></p>
<p><i><span style="font-weight: 400;">Transaction isolation is one of the foundations of database processing. Isolation is the I in the acronym ACID; the isolation level is the setting that fine-tunes the balance between performance and reliability, consistency, and reproducibility of results when multiple transactions are making changes and performing queries at the same time.</span></i></p>
<p><i><span style="font-weight: 400;">InnoDB offers all four transaction isolation levels described by the SQL:1992 standard: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. The default isolation level for InnoDB is REPEATABLE READ.</span></i></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><i><span style="font-weight: 400;">REPEATABLE READ</span></i><i><span style="font-weight: 400;"><br /></span></i><i><span style="font-weight: 400;">This is the default isolation level for InnoDB. Consistent reads within the same transaction read the snapshot established by the first read. This means that if you issue several plain (nonlocking) SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other.</span></i></li>
</ul>
<ul>
<li style="font-weight: 400;" aria-level="1"><i><span style="font-weight: 400;">READ COMMITTED</span></i><i><span style="font-weight: 400;"><br /></span></i><i><span style="font-weight: 400;">Each consistent read, even within the same transaction, sets and reads its own fresh snapshot.</span></i></li>
</ul>
<p><span style="font-weight: 400;">And about Consistent Non blocking reads:</span><span style="font-weight: 400;"><br /></span><i><span style="font-weight: 400;">A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by transactions that committed before that point in time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database.</span></i></p>
<p><span style="font-weight: 400;">Ok, but what does all this mean in practice?</span></p>
<p><span style="font-weight: 400;">To understand, let us simulate this scenario:</span></p>
<p><span style="font-weight: 400;">I have a shop and I decide to grant a bonus discount to a selected number of customers that:</span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Have an active account to my shop</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Match my personal criteria to access the bonus</span></li>
</ul>
<p><span style="font-weight: 400;">My application is set to perform batch operations in a moment of less traffic and unattended. </span></p>
<p><span style="font-weight: 400;">This includes reactivating dormant accounts that customers ask to reactivate. </span></p>
<p><span style="font-weight: 400;">What we will do is to see what happens, by default, then see what we can do to avoid pitfalls.</span></p>
<h1><span style="font-weight: 400;">The scenario</span></h1>
<p><span style="font-weight: 400;">I will use 3 different connections to connect to the same MySQL 8.0.27 instance. The only relevant setting I have modified is the Innodb_timeout that I set to 50 seconds. </span></p>
<ul>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Session 1 will simulate a process that should activate the bonus feature for the selected customers.</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Session 2 is an independent process that reactivate the given list of customers</span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Session 3 is used to collect lock information.</span></li>
</ul>
<p><span style="font-weight: 400;">For this simple test I will use the customer table in the sakila schema modified as below:</span></p>
<pre class="lang:mysql decode:true">CREATE TABLE `customer` (
`customer_id` smallint unsigned NOT NULL AUTO_INCREMENT,
`store_id` tinyint unsigned NOT NULL,
`first_name` varchar(45) NOT NULL,
`last_name` varchar(45) NOT NULL,
`email` varchar(50) DEFAULT NULL,
`address_id` smallint unsigned NOT NULL,
`active` tinyint(1) NOT NULL DEFAULT '1',
`create_date` datetime NOT NULL,
`last_update` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`bonus` int NOT NULL DEFAULT '0',
`activate_bonus` varchar(45) NOT NULL DEFAULT '0',
PRIMARY KEY (`customer_id`),
KEY `idx_fk_store_id` (`store_id`),
KEY `idx_fk_address_id` (`address_id`),
KEY `idx_last_name` (`last_name`),
KEY `idx_bonus` (`bonus`),
CONSTRAINT `fk_customer_address` FOREIGN KEY (`address_id`) REFERENCES `address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE,
CONSTRAINT `fk_customer_store` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4 COLLATE=ut
</pre>
<p><span style="font-weight: 400;">As you can see I have added the </span><i><span style="font-weight: 400;">bonus</span></i><span style="font-weight: 400;"> and </span><i><span style="font-weight: 400;">activate_bonus</span></i><span style="font-weight: 400;"> fields plus the </span><i><span style="font-weight: 400;">idx_bonus</span></i><span style="font-weight: 400;"> index.</span></p>
<p><span style="font-weight: 400;">To be able to trace the locks these are the threads ids by session:</span></p>
<pre class="lang:sh decode:true">session 1 17439
session 2 17430
session 3 17443
</pre>
<p><span style="font-weight: 400;">To collect the lock information:</span></p>
<pre class="lang:mysql decode:true">SELECT
index_name, lock_type, lock_mode, lock_data, thread_id
FROM
performance_schema.data_locks
WHERE
object_schema = 'sakila'
AND object_name = 'customer'
AND lock_type = 'RECORD'
AND thread_id IN (17439 , 17430)
ORDER BY index_name , lock_data DESC;
</pre>
<p><span style="font-weight: 400;">ok , ready? Let us start!</span></p>
<h1><span style="font-weight: 400;">The run…</span></h1>
<p><span style="font-weight: 400;">While the following steps can be done in a more </span><i><span style="font-weight: 400;">compressed</span></i><span style="font-weight: 400;"> way, I prefer to do it in a more verbose way, to make it more human readable.</span></p>
<p><span style="font-weight: 400;">First let us set the environment:</span></p>
<pre class="lang:mysql decode:true">session1 >set transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.07 sec)
session1 >Start Transaction;
Query OK, 0 rows affected (0.07 sec)
</pre>
<p><span style="font-weight: 400;">Then let see the list of the customers we will modify:</span></p>
<pre class="lang:mysql decode:true">session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 383 | 1 | MARTIN | BALES | 1 | 0 |
| 539 | 1 | MATHEW | BOLIN | 1 | 0 |
| 441 | 1 | MARIO | CHEATHAM | 1 | 0 |
| 482 | 1 | MAURICE | CRAWLEY | 1 | 0 |
| 293 | 2 | MAE | FLETCHER | 1 | 0 |
| 38 | 1 | MARTHA | GONZALEZ | 1 | 0 |
| 444 | 2 | MARCUS | HIDALGO | 1 | 0 |
| 252 | 2 | MATTIE | HOFFMAN | 1 | 0 |
| 256 | 2 | MABEL | HOLLAND | 1 | 0 |
| 226 | 2 | MAUREEN | LITTLE | 1 | 0 |
| 588 | 1 | MARION | OCAMPO | 1 | 0 |
| 499 | 2 | MARC | OUTLAW | 1 | 0 |
| 553 | 1 | MAX | PITT | 1 | 0 |
| 312 | 2 | MARK | RINEHART | 1 | 0 |
| 80 | 1 | MARILYN | ROSS | 1 | 0 |
| 583 | 1 | MARSHALL | THORN | 1 | 0 |
| 128 | 1 | MARJORIE | TUCKER | 1 | 0 |
| 44 | 1 | MARIE | TURNER | 1 | 0 |
| 267 | 1 | MARGIE | WADE | 1 | 0 |
| 240 | 1 | MARLENE | WELCH | 1 | 0 |
| 413 | 2 | MARVIN | YEE | 1 | 0 |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)
</pre>
<p><span style="font-weight: 400;">As you can see we have 21 customers fitting our criteria.<br /></span><span style="font-weight: 400;">How much money is involved in this exercise?</span></p>
<pre class="lang:mysql decode:true">session1 >SELECT
-> SUM(amount) income,
-> SUM(amount) * 0.90 income_with_bonus,
-> (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus
-> FROM
-> sakila.customer AS c
-> JOIN
-> sakila.payment AS p ON c.customer_id = p.customer_id
-> where active = 1 and bonus =1 ;
+---------+-------------------+--------------------+
| income | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 2416.23 | 2174.6070 | 241.6230 |
+---------+-------------------+--------------------+
</pre>
<p><span style="font-weight: 400;">This exercise is going to cost me </span><b>~242</b><span style="font-weight: 400;"> dollars. Keep this number in mind.<br /></span><span style="font-weight: 400;">What locks do I have at this point?</span></p>
<pre class="lang:mysql decode:true">session3 >select index_name, lock_type, lock_mode,lock_data from performance_schema.data_locks where object_schema = 'sakila' and object_name = 'customer' and lock_type = 'RECORD' and
thread_id in (17439,17430) order by index_name, lock_data desc;
Empty set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">Answer is none. </span></p>
<p><span style="font-weight: 400;">Meanwhile we have the other process that needs to reactivate the customers:</span></p>
<pre class="lang:mysql decode:true">session2 >set transaction_isolation = 'REPEATABLE-READ';
Query OK, 0 rows affected (0.00 sec)
session2 >Start Transaction;
Query OK, 0 rows affected (0.00 sec)
session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 1 | 1 | MARY | SMITH | 1 | 0 |
| 7 | 1 | MARIA | MILLER | 1 | 0 |
| 9 | 2 | MARGARET | MOORE | 1 | 0 |
| 178 | 2 | MARION | SNYDER | 1 | 0 |
| 236 | 1 | MARCIA | DEAN | 1 | 0 |
| 246 | 1 | MARIAN | MENDOZA | 1 | 0 |
| 254 | 2 | MAXINE | SILVA | 1 | 0 |
| 257 | 2 | MARSHA | DOUGLAS | 1 | 0 |
| 323 | 2 | MATTHEW | MAHAN | 1 | 0 |
| 408 | 1 | MANUEL | MURRELL | 1 | 0 |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">In this case the process needs to reactivate 10 users.</span></p>
<pre class="lang:mysql decode:true">session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.00 sec)
Rows matched: 10 Changed: 10 Warnings: 0
session2 >commit;
Query OK, 0 rows affected (0.00 sec)
</pre>
<p><span style="font-weight: 400;">All good, right? But before going ahead let us double check, session 1:</span></p>
<pre class="lang:mysql decode:true">session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 383 | 1 | MARTIN | BALES | 1 | 0 |
| 539 | 1 | MATHEW | BOLIN | 1 | 0 |
| 441 | 1 | MARIO | CHEATHAM | 1 | 0 |
| 482 | 1 | MAURICE | CRAWLEY | 1 | 0 |
| 293 | 2 | MAE | FLETCHER | 1 | 0 |
| 38 | 1 | MARTHA | GONZALEZ | 1 | 0 |
| 444 | 2 | MARCUS | HIDALGO | 1 | 0 |
| 252 | 2 | MATTIE | HOFFMAN | 1 | 0 |
| 256 | 2 | MABEL | HOLLAND | 1 | 0 |
| 226 | 2 | MAUREEN | LITTLE | 1 | 0 |
| 588 | 1 | MARION | OCAMPO | 1 | 0 |
| 499 | 2 | MARC | OUTLAW | 1 | 0 |
| 553 | 1 | MAX | PITT | 1 | 0 |
| 312 | 2 | MARK | RINEHART | 1 | 0 |
| 80 | 1 | MARILYN | ROSS | 1 | 0 |
| 583 | 1 | MARSHALL | THORN | 1 | 0 |
| 128 | 1 | MARJORIE | TUCKER | 1 | 0 |
| 44 | 1 | MARIE | TURNER | 1 | 0 |
| 267 | 1 | MARGIE | WADE | 1 | 0 |
| 240 | 1 | MARLENE | WELCH | 1 | 0 |
| 413 | 2 | MARVIN | YEE | 1 | 0 |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)
</pre>
<p><span style="font-weight: 400;">Perfect! My Repeatable Read still sees the same snapshot. Let me apply the changes:</span></p>
<pre class="lang:mysql decode:true">session1 >update sakila.customer set activate_bonus=1 where bonus = 1 and active =1 ;
<strong>Query OK, 31 rows affected (0.06 sec)</strong>
Rows matched: 31 Changed: 31 Warnings: 0
</pre>
<p><span style="font-weight: 400;">Wait, what? My list reports 21 entries, but I have modified 31! And if I check the cost:</span></p>
<pre class="lang:mysql decode:true">session1 >SELECT
-> SUM(amount) income,
-> SUM(amount) * 0.90 income_with_bonus,
-> (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus
-> FROM
-> sakila.customer AS c
-> JOIN
-> sakila.payment AS p ON c.customer_id = p.customer_id
-> where active = 1 and bonus =1 ;
+---------+-------------------+--------------------+
| income | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 3754.01 | 3378.6090 | 375.4010 |
+---------+-------------------+--------------------+
</pre>
<p><span style="font-weight: 400;">Well now the cost of this operation is</span><b> 375</b><span style="font-weight: 400;"> dollars not </span><b>242</b><span style="font-weight: 400;">. In this case we are talking about peanuts, but guess what could be if we do something similar on thousands of users. </span></p>
<p><span style="font-weight: 400;">Anyhow let us first:</span></p>
<pre class="lang:mysql decode:true">session1 >rollback;
Query OK, 0 rows affected (0.08 sec)
</pre>
<p><span style="font-weight: 400;">And cancel the operation.</span></p>
<p><span style="font-weight: 400;">So what happened, is this a bug?</span></p>
<p><span style="font-weight: 400;">No it is not! It is how repeatable reads work in MySQL. The snapshot is relevant to the read operation, But if another session is able to write given also the absence of lock at the moment of reads, the next update operation will touch any value in the table matching the conditions. </span></p>
<p><span style="font-weight: 400;">As shown above this can be very dangerous. But only if you don’t do the right things in the code. </span></p>
<h2><span style="font-weight: 400;">How can I prevent this from happening?</span></h2>
<p><span style="font-weight: 400;">When coding, hope for the best, plan for the worse, always! Especially when dealing with databases. That approach may save you from spending nights trying to fix the impossible. </span></p>
<p><span style="font-weight: 400;">So how can this be prevented? You have two ways, both simple, but both with positive and negative consequences.</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Add this simple clausole to the select statement: </span><i><span style="font-weight: 400;">for share</span></i></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Add the other simple clausole to the select statement: </span><i><span style="font-weight: 400;">for update</span></i></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Use Read-committed</span></li>
</ol>
<p><span style="font-weight: 400;">Solution 1 is easy and clean, no other change in the code. BUT create locks, and if your application is lock sensitive this may be an issue for you.</span></p>
<p><span style="font-weight: 400;">Solution 2 is also easy, but a bit more encapsulated and locking. </span></p>
<p><span style="font-weight: 400;">On the other hand Solution 3 does not add locks, but requires modifications in the code and it still leaves some space for problems.</span></p>
<p><span style="font-weight: 400;">Let us see them in detail.</span></p>
<h3><span style="font-weight: 400;">Solution 1</span></h3>
<p><span style="font-weight: 400;">Let us repeat the same steps</span></p>
<pre class="lang:mysql decode:true">session1 >set transaction_isolation = 'REPEATABLE-READ';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name for share;
</pre>
<p><span style="font-weight: 400;">If now we check the locks for brevity I have cut some entries and kept a few as a sample:</span></p>
<pre class="lang:mysql decode:true">session3 >select index_name, lock_type, lock_mode,lock_data,thread_id from performance_schema.data_locks where
-> object_schema = 'sakila'
-> and object_name = 'customer'
-> and lock_type = 'RECORD'
-> and thread_id = 17439
-> order by index_name, lock_data desc;
+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode | lock_data | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus | RECORD | S | supremum pseudo-record | 17439 |
| idx_bonus | RECORD | S | 1, 9 | 17439 |
<snip>
| idx_bonus | RECORD | S | 1, 128 | 17439 |
| idx_bonus | RECORD | S | 1, 1 | 17439 |
| PRIMARY | RECORD | S,REC_NOT_GAP | 9 | 17439 |
<snip>
| PRIMARY | RECORD | S,REC_NOT_GAP | 1 | 17439 |
+------------+-----------+---------------+------------------------+-----------+
63 rows in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">This time we can see that the select has raised a few locks all in S (shared) mode. </span></p>
<p><span style="font-weight: 400;">For brevity I am skipping to report exactly the same results as in the first exercise.</span></p>
<p><span style="font-weight: 400;">If we go ahead and try with session 2:</span></p>
<pre class="lang:mysql decode:true">session2 >set transaction_isolation = 'READ-COMMITTED';
session2 >Start Transaction;
session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
</pre>
<p><span style="font-weight: 400;">Here we go! Attempting to change the values in the customer table is now on hold waiting to acquire the lock. If the duration exceeds the Innodb_wait_timeout, then the execution is interrupted and the application must have a </span><i><span style="font-weight: 400;">try-catch</span></i><span style="font-weight: 400;"> mechanism to retry the operation. This last one is a best practice that you should already have in place in your code. If not, well add it now!</span></p>
<p><span style="font-weight: 400;">At this point session 1 can proceed and complete the operations. After that and before the final </span><i><span style="font-weight: 400;">commit</span></i><span style="font-weight: 400;">, we will be able to observe:</span></p>
<pre class="lang:mysql decode:true">+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode | lock_data | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus | RECORD | S | supremum pseudo-record | 17439 |
| idx_bonus | RECORD | X | supremum pseudo-record | 17439 |
| idx_bonus | RECORD | S | 1, 9 | 17439 |
| idx_bonus | RECORD | X | 1, 9 | 17439 |
<snip>
| idx_bonus | RECORD | X | 1, 1 | 17439 |
| idx_bonus | RECORD | S | 1, 1 | 17439 |
| PRIMARY | RECORD | X,REC_NOT_GAP | 9 | 17439 |
| PRIMARY | RECORD | S,REC_NOT_GAP | 9 | 17439 |
<snip>|
| PRIMARY | RECORD | X,REC_NOT_GAP | 1 | 17439 |
| PRIMARY | RECORD | S,REC_NOT_GAP | 1 | 17439 |
+------------+-----------+---------------+------------------------+-----------+
126 rows in set (0.00 sec)
</pre>
<p><span style="font-weight: 400;">As you can see we now have two different lock types, the Shared one from the select, and the exclusive lock (X) from the update.</span></p>
<h3><span style="font-weight: 400;">Solution 2</span></h3>
<p><span style="font-weight: 400;">In this case we just change the kind of lock we set on the select. In solution 1 we opt for a shared lock, which allows other sessions to eventually acquire another shared lock. Here we go for an exclusive lock that will put all the other operations on hold. </span></p>
<pre class="lang:mysql decode:true">session1 >set transaction_isolation = 'REPEATABLE-READ';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name for UPDATE;
</pre>
<p><span style="font-weight: 400;">If now we check the locks for brevity I have cut some entries and kept few as a sample:</span></p>
<pre class="lang:mysql decode:true">session3 >select index_name, lock_type, lock_mode,lock_data,thread_id from performance_schema.data_locks where
-> object_schema = 'sakila'
-> and object_name = 'customer'
-> and lock_type = 'RECORD'
-> and thread_id = 17439
-> order by index_name, lock_data desc;
+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode | lock_data | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus | RECORD | X | supremum pseudo-record | 17439 |
| idx_bonus | RECORD | X | 1, 9 | 17439 |
<snip>
| idx_bonus | RECORD | X | 1, 128 | 17439 |
| idx_bonus | RECORD | X | 1, 1 | 17439 |
| PRIMARY | RECORD | X,REC_NOT_GAP | 9 | 17439 |
| PRIMARY | RECORD | X,REC_NOT_GAP | 80 | 17439 |
<snip>
| PRIMARY | RECORD | X,REC_NOT_GAP | 128 | 17439 |
| PRIMARY | RECORD | X,REC_NOT_GAP | 1 | 17439 |
+------------+-----------+---------------+------------------------+-----------+
63 rows in set (0.09 sec)
</pre>
<p><span style="font-weight: 400;">In this case the kind of lock is X as exclusive, so the other operations when requesting a lock must wait for this transaction to complete.</span></p>
<pre class="lang:mysql decode:true">session2 >#set transaction_isolation = 'READ-COMMITTED';
session2 >
session2 >Start Transaction;
Query OK, 0 rows affected (0.06 sec)
session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 for update;
</pre>
<p><span style="font-weight: 400;">And it will wait for N time where N is either the Innodb_lock_wait_timeout or the commit time from season 1.</span></p>
<p><span style="font-weight: 400;">Note the lock request on hold:</span></p>
<pre class="lang:mysql decode:true crayon-selected">+------------+-----------+---------------+------------------------+-----------+
| index_name | lock_type | lock_mode | lock_data | thread_id |
+------------+-----------+---------------+------------------------+-----------+
| idx_bonus | RECORD | X | supremum pseudo-record | 17439 |
<snip>
| idx_bonus | RECORD | X | 1, 128 | 17439 |
| idx_bonus | RECORD | X | 1, 1 | 17430 |<--
| idx_bonus | RECORD | X | 1, 1 | 17439 |
| PRIMARY | RECORD | X,REC_NOT_GAP | 9 | 17439 |
<snip>
| PRIMARY | RECORD | X,REC_NOT_GAP | 1 | 17439 |
+------------+-----------+---------------+------------------------+-----------+
64 rows in set (0.09 sec)
</pre>
<p><span style="font-weight: 400;">Session 2 is trying to lock exclusively the idx_bonus but cannot proceed.</span></p>
<p><span style="font-weight: 400;">Once session 1 goes ahead we have:</span></p>
<pre class="lang:mysql decode:true">session1 >update sakila.customer set activate_bonus=1 where bonus = 1 and active =1 ;
Query OK, 0 rows affected (0.06 sec)
Rows matched: 21 Changed: 0 Warnings: 0
session1 >
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 38 | 1 | MARTHA | GONZALEZ | 1 | 1 |
| 44 | 1 | MARIE | TURNER | 1 | 1 |
| 80 | 1 | MARILYN | ROSS | 1 | 1 |
| 128 | 1 | MARJORIE | TUCKER | 1 | 1 |
| 226 | 2 | MAUREEN | LITTLE | 1 | 1 |
| 240 | 1 | MARLENE | WELCH | 1 | 1 |
| 252 | 2 | MATTIE | HOFFMAN | 1 | 1 |
| 256 | 2 | MABEL | HOLLAND | 1 | 1 |
| 267 | 1 | MARGIE | WADE | 1 | 1 |
| 293 | 2 | MAE | FLETCHER | 1 | 1 |
| 312 | 2 | MARK | RINEHART | 1 | 1 |
| 383 | 1 | MARTIN | BALES | 1 | 1 |
| 413 | 2 | MARVIN | YEE | 1 | 1 |
| 441 | 1 | MARIO | CHEATHAM | 1 | 1 |
| 444 | 2 | MARCUS | HIDALGO | 1 | 1 |
| 482 | 1 | MAURICE | CRAWLEY | 1 | 1 |
| 499 | 2 | MARC | OUTLAW | 1 | 1 |
| 539 | 1 | MATHEW | BOLIN | 1 | 1 |
| 553 | 1 | MAX | PITT | 1 | 1 |
| 583 | 1 | MARSHALL | THORN | 1 | 1 |
| 588 | 1 | MARION | OCAMPO | 1 | 1 |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)
session1 >SELECT SUM(amount) income, SUM(amount) * 0.90 income_with_bonus, (SUM(amount) - (SUM(amount) * 0.90)) loss_because_bonus FROM sakila.customer AS c JOIN sakila.payment AS p ON c.customer_id = p.customer_id where active = 1 and bonus =1;
+---------+-------------------+--------------------+
| income | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 2416.23 | 2174.6070 | 241.6230 |
+---------+-------------------+--------------------+
1 row in set (0.06 sec)
</pre>
<p><span style="font-weight: 400;">Now my update matches the expectations and the other operations are on hold.</span></p>
<p><span style="font-weight: 400;">After session 1 commits.</span></p>
<pre class="lang:mysql decode:true">session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 for update;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 1 | 1 | MARY | SMITH | 1 | 1 |
| 7 | 1 | MARIA | MILLER | 1 | 1 |
| 9 | 2 | MARGARET | MOORE | 1 | 1 |
| 178 | 2 | MARION | SNYDER | 1 | 1 |
| 236 | 1 | MARCIA | DEAN | 1 | 1 |
| 246 | 1 | MARIAN | MENDOZA | 1 | 1 |
| 254 | 2 | MAXINE | SILVA | 1 | 1 |
| 257 | 2 | MARSHA | DOUGLAS | 1 | 1 |
| 323 | 2 | MATTHEW | MAHAN | 1 | 1 |
| 408 | 1 | MANUEL | MURRELL | 1 | 1 |
+-------------+----------+------------+-----------+-------+----------------+
<span style="text-decoration: underline;"><strong>10 rows in set (52.72 sec) <-- Note the time!</strong></span>
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.06 sec)
Rows matched: 10 Changed: 10 Warnings: 0
</pre>
<p><span style="font-weight: 400;">Session 2 is able to complete BUT it was on hold for 52 seconds waiting for session 1.<br /></span><span style="font-weight: 400;">As said this solution is a good one if you can afford locks and waiting time.</span></p>
<h3><span style="font-weight: 400;">Solution 3</span></h3>
<p><span style="font-weight: 400;">In this case we will use a different isolation model that will allow session 1 to </span><i><span style="font-weight: 400;">see</span></i><span style="font-weight: 400;"> what session 2 has modified.</span></p>
<pre class="lang:mysql decode:true">session1 >set transaction_isolation = 'READ-COMMITTED';
session1 >Start Transaction;
session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 383 | 1 | MARTIN | BALES | 1 | 0 |
| 539 | 1 | MATHEW | BOLIN | 1 | 0 |
| 441 | 1 | MARIO | CHEATHAM | 1 | 0 |
| 482 | 1 | MAURICE | CRAWLEY | 1 | 0 |
| 293 | 2 | MAE | FLETCHER | 1 | 0 |
| 38 | 1 | MARTHA | GONZALEZ | 1 | 0 |
| 444 | 2 | MARCUS | HIDALGO | 1 | 0 |
| 252 | 2 | MATTIE | HOFFMAN | 1 | 0 |
| 256 | 2 | MABEL | HOLLAND | 1 | 0 |
| 226 | 2 | MAUREEN | LITTLE | 1 | 0 |
| 588 | 1 | MARION | OCAMPO | 1 | 0 |
| 499 | 2 | MARC | OUTLAW | 1 | 0 |
| 553 | 1 | MAX | PITT | 1 | 0 |
| 312 | 2 | MARK | RINEHART | 1 | 0 |
| 80 | 1 | MARILYN | ROSS | 1 | 0 |
| 583 | 1 | MARSHALL | THORN | 1 | 0 |
| 128 | 1 | MARJORIE | TUCKER | 1 | 0 |
| 44 | 1 | MARIE | TURNER | 1 | 0 |
| 267 | 1 | MARGIE | WADE | 1 | 0 |
| 240 | 1 | MARLENE | WELCH | 1 | 0 |
| 413 | 2 | MARVIN | YEE | 1 | 0 |
+-------------+----------+------------+-----------+-------+----------------+
21 rows in set (0.08 sec)
</pre>
<p><span style="font-weight: 400;">Same result as before. Now let us execute commands in session2:</span></p>
<pre class="lang:mysql decode:true">session2 >set transaction_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected (0.00 sec)
session2 >
session2 >Start Transaction;
Query OK, 0 rows affected (0.00 sec)
session2 >SELECT * FROM sakila.customer where bonus = 1 and active =0 ;
+-------------+----------+------------+-----------+-------+----------------+
| customer_id | store_id | first_name | last_name | bonus | activate_bonus |
+-------------+----------+------------+-----------+-------+----------------+
| 1 | 1 | MARY | SMITH | 1 | 0 |
| 7 | 1 | MARIA | MILLER | 1 | 0 |
| 9 | 2 | MARGARET | MOORE | 1 | 0 |
| 178 | 2 | MARION | SNYDER | 1 | 0 |
| 236 | 1 | MARCIA | DEAN | 1 | 0 |
| 246 | 1 | MARIAN | MENDOZA | 1 | 0 |
| 254 | 2 | MAXINE | SILVA | 1 | 0 |
| 257 | 2 | MARSHA | DOUGLAS | 1 | 0 |
| 323 | 2 | MATTHEW | MAHAN | 1 | 0 |
| 408 | 1 | MANUEL | MURRELL | 1 | 0 |
+-------------+----------+------------+-----------+-------+----------------+
10 rows in set (0.00 sec)
session2 >update sakila.customer set active = 1 where bonus = 1 and active =0 ;
Query OK, 10 rows affected (0.00 sec)
Rows matched: 10 Changed: 10 Warnings: 0
session2 >commit;
Query OK, 0 rows affected (0.00 sec)
</pre>
<p><span style="font-weight: 400;">All seems the same, but if we check again in session 1:</span></p>
<pre class="lang:mysql decode:true">session1 >SELECT * FROM sakila.customer where bonus = 1 and active =1 order by last_name;
+-------------+----------+------------+-----------+------+----------------+
| customer_id | store_id | first_name | last_name |bonus | activate_bonus |
+-------------+----------+------------+-----------+------+----------------+
| 383 | 1 | MARTIN | BALES | 1 | 1 |
| 539 | 1 | MATHEW | BOLIN | 1 | 1 |
| 441 | 1 | MARIO | CHEATHAM | 1 | 1 |
| 482 | 1 | MAURICE | CRAWLEY | 1 | 1 |
| 236 | 1 | MARCIA | DEAN | 1 | 0 |
| 257 | 2 | MARSHA | DOUGLAS | 1 | 0 |
| 293 | 2 | MAE | FLETCHER | 1 | 1 |
| 38 | 1 | MARTHA | GONZALEZ | 1 | 1 |
| 444 | 2 | MARCUS | HIDALGO | 1 | 1 |
| 252 | 2 | MATTIE | HOFFMAN | 1 | 1 |
| 256 | 2 | MABEL | HOLLAND | 1 | 1 |
| 226 | 2 | MAUREEN | LITTLE | 1 | 1 |
| 323 | 2 | MATTHEW | MAHAN | 1 | 0 |
| 246 | 1 | MARIAN | MENDOZA | 1 | 0 |
| 7 | 1 | MARIA | MILLER | 1 | 0 |
| 9 | 2 | MARGARET | MOORE | 1 | 0 |
| 408 | 1 | MANUEL | MURRELL | 1 | 0 |
| 588 | 1 | MARION | OCAMPO | 1 | 1 |
| 499 | 2 | MARC | OUTLAW | 1 | 1 |
| 553 | 1 | MAX | PITT | 1 | 1 |
| 312 | 2 | MARK | RINEHART | 1 | 1 |
| 80 | 1 | MARILYN | ROSS | 1 | 1 |
| 254 | 2 | MAXINE | SILVA | 1 | 0 |
| 1 | 1 | MARY | SMITH | 1 | 0 |
| 178 | 2 | MARION | SNYDER | 1 | 0 |
| 583 | 1 | MARSHALL | THORN | 1 | 1 |
| 128 | 1 | MARJORIE | TUCKER | 1 | 1 |
| 44 | 1 | MARIE | TURNER | 1 | 1 |
| 267 | 1 | MARGIE | WADE | 1 | 1 |
| 240 | 1 | MARLENE | WELCH | 1 | 1 |
| 413 | 2 | MARVIN | YEE | 1 | 1 |
+-------------+----------+------------+-----------+------+----------------+
31 rows in set (0.09 sec)
</pre>
<p><span style="font-weight: 400;">We now can see all the 31 customers and also the value calculation:</span></p>
<pre class="lang:mysql decode:true">+---------+-------------------+--------------------+
| income | income_with_bonus | loss_because_bonus |
+---------+-------------------+--------------------+
| 3754.01 | 3378.6090 | 375.4010 |
+---------+-------------------+--------------------+
</pre>
<p><span style="font-weight: 400;">Is reporting the right values.</span></p>
<p><span style="font-weight: 400;">At this point we can program some logic in the process to check the possible tolerance, and have the process either complete performing the update operations or to stop the process and exit.</span></p>
<p><span style="font-weight: 400;">This as mentioned requires additional coding and more logic.</span></p>
<p><span style="font-weight: 400;">To be fully honest, this solution still leaves space to some possible interference, but at least allow the application to be informed about what is happening. Still it should be used only if you cannot afford to have a higher level of locking, but this is another story/article.</span></p>
<h1><span style="font-weight: 400;">Conclusions</span></h1>
<p><span style="font-weight: 400;">When writing applications that interact with a RDBMS, you must be very careful in what you do and HOW you do it. While using data facilitation layers like Object Relational Mapping (ORM), seems to make your life easier, in reality you may lose control on a few crucial aspects of the application’s interaction. So be very careful when opting for how to access your data.</span></p>
<p><span style="font-weight: 400;">About the case reported above we can summarize a few pitfalls:</span></p>
<ol>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">First and most important, never ever have processes that may interfere with the one running at the same time. Be very careful when you design them and even more careful when planning their executions. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Use options such as </span><i><span style="font-weight: 400;">for share</span></i><span style="font-weight: 400;"> or </span><i><span style="font-weight: 400;">for update </span></i><span style="font-weight: 400;">in your select statements. Use them carefully to avoid unuseful locks, but use them. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">In case you have a long process that requires modifying data, and you need to be able to check if other processes have altered the status of the data, but at the same time, cannot have a long wait time for locks. Then use a mix, setting READ-COMMITTED as isolation, to allow your application to check, but also add things like </span><i><span style="font-weight: 400;">for share </span></i><span style="font-weight: 400;">or </span><i><span style="font-weight: 400;">for update</span></i><span style="font-weight: 400;"> in the select statement, immediately before the DML and the commit. That will allow you to prevent writes while you are finalizing, and also to significantly reduce the locking time. </span></li>
<li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Keep in mind </span><b>that long running processes and long running transactions </b><span style="font-weight: 400;">can be</span><b> the source of a lot of pain</b><span style="font-weight: 400;">, especially when using REPEATABLE-READ. Try whenever you can to split the operations in small chunks. </span></li>
</ol>
<p><span style="font-weight: 400;">Finally when developing, remember that DBAs are friends, and are there to help you to do the things at the best. It may seem they are giving you a hard time, but that is because their point of view is different, focusing on data consistency, availability and durability. But they can help you to save a lot of time after the code is released, especially when you try to identify why something has gone wrong. </span></p>
<p><span style="font-weight: 400;">So involve them soon in the design process, use them and let them be part of the process. </span></p>
<h1><span style="font-weight: 400;">References</span></h1>
<p><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html"><span style="font-weight: 400;">https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-isolation-levels.html</span></a></p>
<p><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html"><span style="font-weight: 400;">https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html</span></a></p>
<p><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-model.html"><span style="font-weight: 400;">https://dev.mysql.com/doc/refman/8.0/en/innodb-transaction-model.html</span></a></p>]]></description>
<category>MySQL</category>
<pubDate>Tue, 09 Nov 2021 10:11:31 +0000</pubDate>
</item>
<item>
<title>MySQL on Kubernetes demystified</title>
<link>http://www.tusacentral.net/joomla/index.php/mysql-blogs/243-mysql-on-kubernetes-demystified</link>
<guid isPermaLink="true">http://www.tusacentral.net/joomla/index.php/mysql-blogs/243-mysql-on-kubernetes-demystified</guid>
<description><![CDATA[<h1>Why<img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/stop-befool.jpg" alt="stop befool" style="float: right;" /></h1>
<p>Marco, why did you write this long article?</p>
<p>Yes, it is long, and I know most of the people will not read it in full, but my hope is that at least someone will, and I count on them to make the wave of sanity. </p>
<p>Why I wrote it is simple. We write articles to share something we discover, or to share new approaches or as in this case to try to demystify and put in the right perspective the “last shining thing” that will save the world. </p>
<p>The “last shining thing” is the use of containers for relational databases management systems (RDBMS) and all the attached solutions like Kubernetes or similar. </p>
<p>Why is this a problem? The use of containers for RDBMS is not really a problem per se, but it had become a problem because it was not correctly contextualized and even more important, the model that should be used to properly design the solutions, was not reviewed and modified in respect to the classic one.</p>
<p>One example for all is this image:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/db_on_kube_1.png" alt="db on kube 1" width="671" height="386" /></p>
<p>Source (<a href="https://www.cockroachlabs.com/blog/kubernetes-trends/"></a><a href="https://www.cockroachlabs.com/blog/kubernetes-trends/">https://www.cockroachlabs.com/blog/kubernetes-trends/</a>) </p>
<p>In this report we find the use of the term Database multiple times, and reference how easy it is to adopt and scale a Database on Kubernetes. But the problem is … what Database? Not all the Databases are the same, not all can be adopted so easily, some use more restrictive design to be real RDBMS, others less. </p>
<p>Some are designed to scale horizontally, others no. The other missed part is that, to get something you need to give something. If I design a system to scale horizontally, I probably have to pay a price on something else. It could be the lack of referential integrity or the Isolation level, it doesn’t matter, but you do not get anything for free. </p>
<p>Generalizing without mentioning the difference is misleading.</p>
<p>What happens instead is that we have a lot of people, presenting solutions that are so generic that are unusable. The most hilarious thing is that they present that as an inevitable evolution, the step into the future that will solve every problem. But when doing this, they do not clarify what is the “Database” in use, what you get, what you lose. This may lead to misunderstanding and future frustration. </p>
<p>For instance we constantly see presentations that illustrate how easy it will be to manage hundreds, thousands of pods containing RDBMS, without even understanding the concept of RDBMS, the data they may host and its dimension. </p>
<p>The more we go ahead the more dangerous this disinformation is becoming, because we can say that on 100 companies currently using RDBMS to manage their crucial data, only 10 have a good team of experts that understand how to really use containers, and probably only 1 or 2 of them have real data experts that are able to redesign the data layer to be correctly utilized in containers or able to see to what proper Database to migrate to achieve the expected results.</p>
<p>Another common expectation is to move to container/kubernetes to reduce the costs, no matter if related to the iron (physical servers) or of the management of them. </p>
<p>Indeed you can optimise that part, but you need to understand the limitations of your future solution. You must take into account that you may not have the same level of service in a way or another. Honestly I haven't seen that addressed in current blogging or reports. </p>
<p>This is why this article. What I want to do is to open a door to a discussion, that will lead us to review the original model used to design data inside RDBMS, that will allow anyone to safely approach a different model to use for modern database design.</p>
<h1>When</h1>
<p class="btn-inverse">(if you know how we get here you can skip this)</p>
<p>To fully understand what we are talking about we need to do a jump back in time, because without history we are nothing, without memory we are lost. Just note that I am going to touch on things at a very high level and for what we are concerned, otherwise this will be a book not an article.</p>
<p>A long time ago in a world far far away where the internet was not, there was: client server approach.</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/ClientServerArchitecture1.png" alt="ClientServerArchitecture1" width="340" height="223" /> </p>
<p>At that time we had many clients connecting to a server to provide access to whatever. Most of the clients were performing information rendering and local data validation, then they sent the information back to the application server who processed them and store… where? Well there was a high variety of containers. Some were just proprietary files with custom format, others connected to an external database. What is relevant here is that we had a very limited number of clients (often less than 1k) and at the end the amount of data in transit and then store was very small (if a database had more than 1GB data in total it was considered a monster). </p>
<p> </p>
<p>Then came the internet … and many things started to change. </p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/clientserver-web.gif" alt="clientserver web" width="431" height="205" /></p>
<p>We still have a client on our PC, but now is called a web Browser, the information rendering is now standardize using SGML standards (HTML tagging), the server is not anymore a simple Application Server, but a web server who is connected to one or many application(s) using common gateway interface (CGI). Each application was then handling the data in/out independently, some using databases some not. </p>
<p>This new approach add to the previous model many challenges like:</p>
<ul>
<li aria-level="1">Anyone from anywhere can access the Web Server and any application hosted</li>
<li aria-level="1">The number of requests and clients connected jump from well predictable numbers to something impossible to define.</li>
<li aria-level="1">The connection to a web server follows a different approach based on a request issued by a browser and a set of information that will be sent as an answer from the server. (later we will have more interactive/active protocols but let us stay high ok?) </li>
<li aria-level="1">Many application were duplicating the same functions but with different approach</li>
<li aria-level="1">Data received and sent needs to be consistent not only locally on the server, but also between requests. </li>
</ul>
<p>So on top of the problem generated by the number of possible clients connected and the amount of requests per second a web server was in need to process, the initial model was wasting resources in development and when running. There was the need to optimise the interactions between applications and their functions, to cover that the Service Oriented Architecture (SOA) model was largely adopted.</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/1920px-SOA_Metamodelsvg.png" alt="1920px SOA Metamodelsvg" width="500" height="413" /></p>
<p>The data problem was partially solved identifying RDBMS as the best tool to use to guarantee the level of data consistency and at the same time it was possible to organize data in containers (tables) with validated interactions (Foreign keys and constraints) solving the online transaction processing (OLTP) problem. While for business analysis and reporting the Online analytical processing (OLAP) model was chosen. The OLAP model is based on facts and dimensions, which are elements of the data hypercube which is the base model that is utilized to first transform and then access the data. </p>
<p>During this time, the data as volume starts to increase significantly, most of the systems, when well designed, start to implement the concept of archiving, partitioning, and sharding. </p>
<p>We are around 2005-2008.</p>
<p> </p>
<p>Then social media and online shopping explode... </p>
<p>Well we know that, right? We are still in this huge big bang of nonsense. What changed was not only that now anyone can access whatever is on a web server as certified information, but we can chat with anyone, we can buy almost anything, we can sell anything, we can make photos and share them, videos and ask anyone from anywhere to comment and we can say whatever and that crap will remain forever around. </p>
<p>And guess what? To do that you need a very, very, very powerful scalable platform and it should be resilient, but for real not as a joke. When we look for something today, we open our phones, we expect to have that now and no matter what. If we do not get it, we may look to a different place, like another shop, or another chat system or whatever. To achieve this you do not only need to have some specialized bloks, but to split the load in many small parts, then be able to have redundancy for each one of them, and be able to have any functional block able to connect to another functional block. Moreover, if one of the blocks goes down I must be able to retry the operation using another functional block of the same kind, and be sure I will have the same results. </p>
<p>To achieve this the SOA concept was evolved into the Microservice concept. But microservice also brings new challenges. One for all is how to manage these thousands of different functional bloks, and how to efficiently deploy/remove/modify them. </p>
<p>Interestingly the term DevOps also started to get more relevance, and automation became a crucial part of ANY environment. </p>
<p>In short the modern application architectures moves from Monolithic or SOA to microservice as shown here:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/microservice-draft-3-03-db.png" alt="microservice draft 3 03 db" width="500" height="263" /></p>
<p>Of course reality sees each microservice block be redundant and scalable. Given that for each microservice kind we can have 1 to infinite number of instances. </p>
<p>It is in this context that the use of Kubernetes comes in help. Kubernetes is designed to help manage the resources as microservices, each element (pod) is seen in function of the service provided, the service will scale up and down adding/removing the pod, keeping the service resilient. </p>
<p>Expecting to deal with a structure like this without proper tooling/automation is just not possible:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/amazon-death-star.png" alt="amazon death star" width="500" height="332" /></p>
<p>This picture represent the Amazon microservice infrastructure in 2008 <a href="https://twitter.com/Werner/status/741673514567143424">https://twitter.com/Werner/status/741673514567143424</a></p>
<h2>In conclusion</h2>
<p>In short, we can say that the complex topologies we have to deal with today are the results of a needed evolution from monolithic application approach to a distributed one in services/microservices. The proliferation of which had made absolutely necessary the creation and usage of a tool to facilitate the management, reducing complexity, time to perform the operations and costs. </p>
<h1>Cool eh? What about the database? </h1>
<p>The first thing to say is that RDBMS as Oracle, Postgres and MySQL did not evolve in the same direction. They remain centralized structures. We have clusters but mostly to answer to High Availability needs and not scaling. </p>
<p>This is the part of the story, the current evangelists of the “last shining thing” are not telling you. There is no coverage for the RDBMS like Oracle, Postgres or MySQL in the microservice approach. None, zero, nich, nada!</p>
<p>To bypass that problem, many microservices developers implemented complex mechanisms of caching or use noSQL approaches. <br />Using caches to speedup the processing is absolutely ok, trying to use them instead of an ACID solution, is not. Same as the use of noSQL, if it makes sense because the service requires then ok otherwise no.</p>
<p>But what is the problem in seeing a RDBMS as a microservice?<br />Let us start with a positive thing. If you design it as a microservice and reset the expectations, then it is ok.</p>
<h2>What benefits (pros)? </h2>
<p>The first thought is why I should move to kubernetes/containers. What am I expecting to get more that I do not already have?</p>
<p>Well let see the few basic points a microservice solution should help me with in case of application:</p>
<ul>
<li aria-level="1">Modularity:
<ul>
<li aria-level="2">This makes the application easier to understand, develop, test, and become more resilient to architecture erosion.</li>
</ul>
</li>
<li aria-level="1">Scalability:
<ul>
<li aria-level="2">Since microservices are implemented and deployed independently of each other, i.e. they run within independent processes, they can be monitored and scaled independently.</li>
</ul>
</li>
<li aria-level="1">Better Fault Isolation for More Resilient Applications.
<ul>
<li aria-level="2">With a microservices architecture, the failure of one service is less likely to negatively impact other parts of the application because each microservice runs autonomously from the others. </li>
<li aria-level="2">Nevertheless, large distributed microservices architectures tend to have many dependencies, so developers need to protect the application from a dependency failure related shut down.</li>
</ul>
</li>
<li aria-level="1">Integration of heterogeneous and legacy systems:
<ul>
<li aria-level="2">microservices is considered as a viable means for modernizing existing monolithic software applications.</li>
<li aria-level="2">There are experience reports of several companies who have successfully replaced (parts of) their existing software with microservices, or are in the process of doing so.</li>
<li aria-level="2">The process for Software modernization of legacy applications is done using an incremental approach.</li>
</ul>
</li>
<li aria-level="1">Faster Time to Market and “Future-Proofing”
<ul>
<li aria-level="2">The pluggability of a microservices application architecture allows for easier and faster application development and upgrades. Developers can quickly build or change a microservice, then plug it into the architecture with less risk of coding conflicts and service outages. Moreover, due to the independence of each microservice, teams don’t have to worry about coding conflicts, and they don’t have to wait for slower-moving projects before launching their part of the application.</li>
<li aria-level="2">Microservices also support the CI/CD/CD (Continuous Integration, Continuous Delivery, Continuous Deployment) development philosophy. This means that you quickly deploy the core microservices of an application – as a minimum viable product (MVP) – and continuously update the product over time as you complete additional microservices. It also means that you can respond to security threats and add new technologies as they appear. The result for the user is a “future proof” product that’s constantly evolving with the times.</li>
</ul>
</li>
</ul>
<p>Of course this is for applications let us try to map similar things for a database if it makes any sense. </p>
<ul>
<li aria-level="1">Modularity. We can have two different level of modularity:
<ul>
<li aria-level="2">By service. When using by service we need to have data modules that serve the microservices who use that segment of data. IE login needs data about user login only, not information about customer orders.</li>
<li aria-level="2">By context. When by context, we need to group as much as possible the data relevant to a specific topic. In a multitenant database we will need to separate the data such that all microservices will be able to scale accessing the data relevant to only one tenant, not all, then filtering. IE in our example above, we will have the whole schema in our service but only for one single tenant </li>
</ul>
</li>
<li aria-level="1">Scalability. To gain infinite scalability as we do for applications, we should be able to:
<ul>
<li aria-level="2">Either the database service is able to perform non impacting distributed locking. What I mean here is that independently by the type of modularity used, if we need to be able to scale the number of requests/sec duplicating the service who answer the requests, we need to be able to avoid two different pods to overwrite eachother without or with minimal impact on the performance. As for today in the MySQL area we have two solutions that could cover that, PXC with galera and MySQL/PS with group replication. But both solutions have performance impact when using multiple writers. Not only, in these solutions data is not distributed by partitions, but duplicated in full for each pod. This means that unless you have a very good archiving strategy your space utilization will always be S x N (where s is the total space in one pod and N the number of pods).<br />Because of this, if we choose this approach, we need to keep our dataset to the minimum to reduce any locking impact and data utilization. </li>
<li aria-level="2">Or sharding the data by pods, which means we need to have a controller able to distribute the data. Of course we need to have High Availability so we need to consider at least a copy of each node or distributed partitioning. As far as I know MySQL or Postgres don’t have this functionality, Mongo not sure.</li>
</ul>
</li>
</ul>
<p>In the end, a RDBMS microservice will not be able to scale in the same way an application microservice does. </p>
<ul>
<li aria-level="1">Better Fault Isolation for More Resilient Applications. <br />Well it comes without saying. If the model used is by context, the fact we will isolate the data service in smaller entities will help the resilience of the whole system. But the relationship that exists inside a schema can make the by service model more fragile, given that when 1 segment is down it will not be possible to validate the reference . </li>
<li aria-level="1">Integration of heterogeneous and legacy systems. To be honest I can see this is possible only if modularity by service is used. Otherwise we will still have to deal with the whole structure and multiple services. In the last case I don’t see any benefit and we will have to perform operations like modifying a table definition, that can be as fast as renaming a field, or very impactful.</li>
<li aria-level="1">Faster Time to Market and “Future-Proofing”<br />Indeed the divide and conquer approach especially if using the by service model will help in making each service more flexible to changes, but if by context is used, then we may have a negative effect because an application microservice may read data from N data microservices. If we change one data microservice (ie adding an index to a table), this change must be applied to all of the N microservice by context or the behaviour of the microservices will be different from the one with the change.</li>
<li aria-level="1">Distributed development. Also in this case we need to consider it easier to achieve if modularity by service is used. Otherwise I don’t really see how it will be possible to perform any of the mentioned points that are valid for applications. </li>
</ul>
<h2>What concerns (cons)?</h2>
<p>Now let us discuss the possible concerns.</p>
<ul>
<li aria-level="1">Services form information barriers. Well it comes without saying. </li>
<li aria-level="1">Inter-service calls over a network have a higher cost in terms of network latency and message processing time than in-process calls within a monolithic service process. Just think about some standard maintenance work that needs to be done to consolidate the data from each service to a single location for business intelligence purposes. And keep in mind that all must be encrypted for security reasons. </li>
<li aria-level="1">Testing and deployment are more complicated. While acting on a single module will be easier, the deployment of the full solution and its testing will become extremely complicated. </li>
<li aria-level="1">Viewing the size of services as the primary structuring mechanism can lead to too many services when the alternative of internal modularization may lead to a simpler design. This is true when using the by service modularity, we may end up having too many fragments of data, with serious difficulties in consolidating it. And of course, we still need to know and keep an eye on what is happening on the rest of the schema used by other services.</li>
<li aria-level="1">Two-phased commits are regarded as an anti-pattern in microservices-based architectures as this results in a tighter coupling of all the participants within the transaction. However, lack of this technology causes awkward dances which have to be implemented by all the transaction participants in order to maintain data consistency.</li>
<li aria-level="1">Development and support of many services is more challenging if they are built with different tools and technologies. It is not uncommon to have complex applications use different data sources, like RDBMS and a document store. While the use of microservice can help in simplifying how the data is collected and handled from these different sources, the additional fragmentation leads to a significant increase of the complexity to handle the platform. To be honest, solutions as Kubernetes can help a lot here, but still an issue. </li>
<li aria-level="1">The protocol typically used with microservices (HTTP) was designed for public-facing services, and as such is unsuitable for working internal microservices that often must be impeccably reliable. This issue applies to databases in a different way. Because we will have 2 different aspects. The first one is the application microservice connecting and reading/writing to the data microservice. These operations must use the proprietary client protocol of the RDBMS. Then the second part is when the application microservice is sharing/serving the data to others. Say I am posting a payment and a network glitch happens between application microservices. I must be able to:
<ul>
<li aria-level="2">Retry the data sharing</li>
<li aria-level="2">Rollback the operation on the database if failing</li>
</ul>
</li>
</ul>
<p>The HTTP protocol is not the best/most performant way to that given the post/get approach used. </p>
<h2>Let me give an example to clarify this. </h2>
<p>We are an organization who does multiple activities. We have shops all over the globe and a huge segment of my sales is online. We also organize travels and host web-stores. To deal with the traffic and load I have moved my large part of the application architecture to microservice. To provide better services to my customers I also have to differentiate my catalog by geographic regions, splitting my market into North America, Europe, Middle-East, Asia and pacific. I have data centers in each geo-zone. Each geo-zone also hosts a local data repository that is focused on the specifics of the area. Finally my sales are consolidated in two steps, first locally by geo-area, then centrally in the North America HQ. We have hundreds of database services for each segment of business, and thousands of pods of the application.</p>
<p>One of mine online shop that use microservices at high level looks like this for each geo-zone:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/db-as-service1.png" alt="db as service1" width="349" height="221" /></p>
<p>Where Service-1 can be the login of a user and its authentication, Service-2 the show/retrieval of the items in special-sales, Service-3 dealing with existing order information … and so on. </p>
<p>So I have specialized microservices that may serve multiple applications, like sign on, but each microservice at the moment needs to retrieve and send data from/to the same repository. </p>
<p>Each database contains data relevant to a specific business activity. For instance our main store data is separate from the travel data and so on. Because this the “My DB” above just represents one of many databases we have. The detailed design of it may resemble this:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/schema-db.gif" alt="schema db" width="693" height="870" /></p>
<p>All credits for this diagram go to Barry Williams</p>
<h2>Let us play the game, what do I need to do to convert this to a microservice?</h2>
<p>Let us now try to apply what we have discussed so far about the expectations (pros/cons) and see if we will be able to make this fit. Given I am a MySQL expert I am going to cover the next step in the MySQL context.</p>
<h3>Case one Modular by context</h3>
<p>The first step is to choose the kind of modularity we want/can use. Let us start with the one which requires less changes, the module by context. </p>
<p>When using that approach our schema remains the same, what we do is in case of multi-tenant to make it single-tenant, then we need to see how to cover the other aspects. </p>
<p> </p>
<p>The next step is scalability, and here we start to have problems. As already mentioned above there is no way to achieve the same level of scalability we have with the application layer. So what can we really do? </p>
<p>As mentioned in the MySQL ecosystem we have two major solutions, one is Percona XtraDB Cluster (PXC) using galera virtually synchronous replication, the other is MySQL or Percona Server with group replication (GR). We have another player here which may actually fit much better, but at the time of writing there is no container solution for it. I am talking about the MySQL NDB cluster.</p>
<p>Anyhow both PXC and GR allow to have Single primary (one writer) or multiple Primary (multiple writers). But the use of multi Primary is discouraged because of the loss of performance caused by the write conflict resolution. What this means is that at each commit the node receiving the writes must certify the commit against the whole cluster. The more nodes write at the same time, the more chances of conflicts will rise. Each conflict will end up with a conflict resolution and a possible rollback of the operation. All these instead adding scalability is actually impacting it and significantly reducing the performance.</p>
<p>On the other hand, reads can easily scale using multiple nodes. But keep in mind that the two solutions cannot really scale in number of nodes to the infinite. GR has a hard limit of nine nodes, while PXC has no hard limit, but more nodes, more latency at each write commit and more cost in maintaining the cluster, in short when you reach a cluster with 7 nodes you already start to experience significant slow down and latency. </p>
<p>So what level of scalability do we have here? The answer is easy, none! </p>
<p>Yes none, as it is you can have and will have only 1 node (pod) writing at a given time, so your level of scalability in writing is the capacity of 1 node, period. </p>
<p> </p>
<p>Unless you adopt sharding. In that case you may have multiple microservice each covering one shard with a fixed number of nodes (pods). So the scaling is not internal to the microservice but in the duplication of it. </p>
<p>Application at this point has three ways to use this approach:</p>
<ul>
<li aria-level="1">Be shard aware and have a list of microservice to use when in the need (a shard catalog)</li>
<li aria-level="1">Be shard aware but refer to a generalized catalog service to access the data.</li>
<li aria-level="1">Not shard aware but pass information to the generalized catalog service to identify the shard.</li>
</ul>
<p> The first option is not really optimal because anytime we add a shard we need to “inform/modify” the catalog in all relevant application microservice. </p>
<p>The second and the third assume the presence of another service dedicated to resolving the sharding. </p>
<p>I am not going into the details here but this can be achieved by adding a ProxySQL microservice. Such that the scenario will be:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/db-as-service-proxy-microservice-sharding.png" alt="db as service proxy microservice sharding" width="439" height="402" /></p>
<p>Where each data service shard is a Percona Distribution for MySQL Operator (PDMO).</p>
<p>Scalability is then guaranteed by adding a PDMO shard and just an entry in ProxySQL in the mysql servers and query rules. </p>
<p> </p>
<p>Better Fault Isolation for More Resilient Applications. Adopting the above model will allow us to be not only more resilient in case we migrate from a multi-tenant approach, but also in case we do not. No matter what if the application is correctly written if a shard will go down only that segment will be affected. </p>
<p> </p>
<p>Integration and changes will become more cumbersome given the scale of the services. This is in the model and will represent a risk in case of mistakes or improper data definition modifications. But it will be easier to roll back by shard than against the monolithic data service. </p>
<p>Distributed development. There will not be any advantage here, as developer or DBA I still need to consider the whole data definition scenario and not just a small segment. So while a developer can focus only on one aspect (i.e. login), a DBA still needs to take care of the whole schema and the impact of the operations (like locking) on the single table (i.e. users) for that service. </p>
<p> </p>
<p>Dimension of the dataset. But here we have another important concern, at least for now. To be so agile each single service (or microservice) needs to be able to be rebuilt quickly. This means that when a pod inside a data microservice crashes the cluster needs to rebuild it as fast as possible, or the QOS (quality of service) will be impacted. As for now both PXC and GR if in need to rebuild a node must actually copy the data over the new pod. This can be very fast in case of small datasets, or take days in case of very large. Given that when we plan to use a data microservice we also need to plan/design a very efficient archiving mechanism that allows us to stay in our microservices only the data that is really needed at the moment. All the data that can be considered not actual should be moved to an OLAP system and have dedicated service to retrieve it. </p>
<p> </p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/db-as-service-etl.png" alt="db as service etl" width="594" height="632" /></p>
<p> </p>
<h3>Case two modularity by service. </h3>
<p>What modularity by service means? It means I can split the data schema definition into small segments that correspond to the services. So my data segment will be functional to the use done by the application microservices. As an example I may have microservice dealing with: Login; shipments; invoicing … and so on. </p>
<p>This means that in case of the invoice I should fragment the schema as follow:</p>
<p><img src="http://www.tusacentral.net/joomla/images/stories/pdmo_stop_crazy/schema-invoice-order.png" alt="schema invoice order" width="500" height="357" /></p>
<p>But these are the same tables I also need to serve shipments, so I must use the same data microservice, for both. This is ok … but … the question is can we? Can we split in small segments a schema that is built on relationships? Orders are linked to Customers and to products. How do we manage these referential integrity links if we start to split the schema? </p>
<p>Again a simple answer, we cannot. If you are using a RDBMS the R is for relational and you cannot break the model and hope to have some data integrity. </p>
<p>If you need that approach, then please use something else like noSQL Cassandra or similar. And then consolidate the data in the OLAP system. But do not use a RDBMS.</p>
<p>Given the above the by service approach, which was the one closer to give us what we need in the microsystem area, is not utilizable when talking of RDBMS. Period. </p>
<h2>I want to move to containers/Kubernetes to rationalize the resource utilizations and lower management cost</h2>
<p>So far we have talked about moving to containers/kubernetes as an answer to application microservices, but there is another possible need that may lead you to see containers/kubernetes as “the solution”, the need to optimize resources.</p>
<p> </p>
<p>It goes without saying that if you have hundreds or more small instances of a RDBMS, sparse n many different physical servers, or also some virtualization, the move may help you. The current operators like PDMO offer you a lot of automation, to perform not only installation of the main environment, but you will be able to easily automate and manage:</p>
<ul>
<li aria-level="1">Installation</li>
<li aria-level="1">Geographic distribution</li>
<li aria-level="1">Backup/restore</li>
<li aria-level="1">Point in time recovery</li>
</ul>
<p>All this with simple commands through kubernetes. </p>
<p> </p>
<p>As also indicated in my blog <a href="http://www.tusacentral.net/joomla/index.php/mysql-blogs/242-compare-percona-distribution-for-mysql-operator-vs-aws-aurora-and-standard-rds">here</a>, you will also be able to rationalize the resource utilization reducing the costs. </p>
<p> </p>
<p>So yes, of course this can be helpful and probably you should seriously look into migrating your environment to it. But it is not for all. </p>
<p>Again keep in mind that Kubernetes/containers are mainly designed to work with distributed services, and when talking about resiliency, the resilient factor for kubernetes is the service, not the single pod.</p>
<p>What it means is that it could happen that a pod has issues and that it will be destroyed and rebuilt. If you have a pod containing stateless application, this takes seconds and has no impact on the cluster. </p>
<p>But if you are talking about the node of a cluster with several terabytes of data, that will impact the cluster and the service. </p>
<p> </p>
<p>This is it, when dealing with large clusters and kubernetes, you may need to keep into account factors that may trigger actions, like dropping a pod, that instead require more flexibility. </p>
<p>A good example is when a pod starts to have temporary network glitches. In the case of PDMO with PXC, the cluster will try to heal itself at PXC level. That operation may temporarily exclude a node from the cluster, and then let it join again. This operation has a cost in performance and efficiency of the cluster itself, but normally it resolves in a few seconds.</p>
<p>The same network glitches can be seen from kubernetes as a pod failure, and action may be taken directly on the pod, removing it from the cluster and rebuilding a new node. That operation not only may take hours or days, but will impact on the cluster performance in a significant way until all the data is copied over from a data node to another.</p>
<p>The addition of kubernetes also adds the need to have additional understanding on how effectively tuning the service to prevent wrong management actions like the one described. Tuning that needs to be done in respect to the traffic served and the dimension of the dataset. </p>
<p> </p>
<p>The advice here is to start small, adopt the help of experts in setting up the environment and tune it properly, gain good understanding and then start to increase the challenge. And never trust the defaults settings in the configuration files.</p>
<h1>Wrapping up … </h1>
<p>Given all the above.</p>
<p>If we are looking to containers/Kubernetes for our databases as an addition to an existing or new environment based on microservices, then the question is: <em>Are we following evolution and really providing a benefit adopting the kubernetes/microservice approach to RDBMS?</em></p>
<p>As we have seen the move to microservice was an answer to provide better flexibility, scalability and many other *bility to the application layer. </p>
<p>An RDBMS is, on the other hand, a system designed to answer to the ACID paradigm, which is the core of a RDBM and the model should never be broken. </p>
<p>As we have seen, it is possible to create fragments and scale, but this needs to happen in getting consistent sets of data. <br />Sharding is one option, but it requires a lot of investment in re-design the data layer and adapting the application. It will also open a lot of new problems, including data archiving and possible/probable data redundancy. All problems that require a deep analysis and proper design before moving. Not impossible but not something you should do without preparation, care and investment in time and money. </p>
<p>If instead the scope is to reduce the overhead in management and resource optimization, the task is easier. The scaling factor is not there, and you are not bounding your data layer to a microservice concept. But, you still need to be careful and properly analyze the move, considering the side effects not only today, but in the long run. Questions like, how big my dataset will be in 1 year, are pivotal. </p>
<p>Blindly moving your database stack to containers/kubernetes or similar, without raising the right questions will only be your source of pain and despair for the next few years. It will not be your saviour but your executioner. </p>
<p>At the same time there are cases where it may fit, as shown previously, but is not a blanket solution. </p>
<p>Keep in mind this brief list:</p>
<ul>
<li aria-level="1">An RDBMS like MySQL, does not scale writes inside a single service (at the moment of writing)</li>
<li aria-level="1">Bigger is the dataset longer it will be the possible downtime you will suffer (consider review your RTO/RPO)</li>
<li aria-level="1">You cannot have a relation 1:1 with an application microservice, unless your RDBMS contains only non relational schemas.</li>
<li aria-level="1">In case you go for microservice (and shard). You MUST have an OLAP system to serve consolidated data.</li>
<li aria-level="1">In case you are looking to reduce management cost and gain resource optimization. Start small, and keep going like that. </li>
</ul>
<p>Said that, if you instead of moving foolishly, stop and ask DBA and DATA experts for help, and design a proper system, then the use of RDBMS on containers/Kubernetes could become a nice surprise. </p>
<p>Last thing, you may discover that you do not need a tight RDBMS, you may discover that something else could serve your needs better like Cassandra or CockroachDB, and then you need to migrate to it as a real solution. In that case your need for data experts is even more urgent and the need to have them working closely with your developers is a mandatory step to success. </p>
<p> </p>
<h1>Disclaimer </h1>
<p>My blog, my opinions. </p>
<p>I am not, in any way or aspects, presenting here the official line of thoughts of the company I am working for.</p>
<h1>Comments</h1>
<p>Comments are welcome but given the topic and the fact this is going to be more probable (hopefully) an open discussion, I advice you to comment on <a href="https://www.linkedin.com/posts/marcotusa_mysql-on-kubernetes-demystified-activity-6854020804426915840-45mW">this linkedin post</a>. </p>]]></description>
<category>MySQL</category>
<pubDate>Wed, 13 Oct 2021 07:00:09 +0000</pubDate>
</item>
</channel>
</rss>
If you would like to create a banner that links to this page (i.e. this validation result), do the following:
Download the "valid RSS" banner.
Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)
Add this HTML to your page (change the image src
attribute if necessary):
If you would like to create a text link instead, here is the URL you can use: