Databases News
| Bug.mysql.com and Contributions!
Oracle enhanced the bugs.mysql.com site to provide a better experience for users to submit contributions !A new 'Contributions' tab has been added to the bugs.mysql.com user interface. This tab will allow users to have a defined space for their contributions. An Oracle Contributor Agreement (OCA) will still be required for all contributions. If needed, the OCA FAQ is posted here. Please take advantage of this new feature when you help support and enhance MySQL !
sqlhjalp.blogspot.com |
5/17/12 5:35 PM
MySQL Handshake and Encryption
Interestingly, I have given the presentation on MySQL and Security at least 4 times in the past 6 weeks* and it was only last night, with the sharp minds at Baron’s Central Virginia MySQL Meetup Group (sadly Baron was not there!), that someone asked about when encryption happens in the MySQL handshake.
We had been talking about how MySQL authenticates users, and how if there are no ACL’s set for a given host, MySQL will reject connections from that host – even “telnet host 3306″ will be refused – and that’s when a clever audience member asked where in the handshake process encryption started. Is it before the username is sent? Before the password is sent? Does it encrypt all traffic, even the handshake traffic?
I think that’s an excellent question, and I know there’s a few sharp minds out there who probably know the answer….otherwise I will research the answer this weekend, when I’m back home in Boston.
* Effective MySQL User Group, as part of a tutorial for Percona Live: MySQL Conference and Expo , at the Professional IT Community Conference last week, and last night at the Central Virginia MySQL Meetup Group
www.sheeri.com |
5/17/12 3:50 PM
Meet the MySQL Experts Podcast: MySQL Replication Global Transaction Identifiers & HA Utilities
In the latest episode
of our “Meet The MySQL Experts” podcast, Luis Soares,
Engineering Manager of MySQL Replication discusses the new Global Transaction
Identifiers (GTIDs) that are part of the latest MySQL 5.6 Development Release. We are also joined by Chuck Bell who
discusses how the new MySQL HA utilities use GTIDs to create a self-healing
replication topology.
In the podcast, we cover how GTIDs and the HA utilities are implemented, how they are
configured and considerations for their use.
You can also learn
more from Luis’ blog on GTIDs in MySQL 5.6 and Chuck’s blog on the HA utilities.
Of course, GTIDs are
just one of the major new features of MySQL replication. For a complete
overview, take a look at our DevZone article: MySQL 5.6 Replication - Enabling the Next Generation of Web & Cloud
Services.
You can try out MySQL
5.6 and GTIDs by downloading the Development Release (select Development Release tab)
Enjoy the GTID podcast and let us know what topics you would like covered in future podcasts! Also check out the library of Meet the MySQL Experts podcasts
blogs.oracle.com |
5/17/12 12:29 PM
Can MySQL use primary key values from a secondary index?
In the article about the role of a primary key, I mentioned that a secondary index in an InnoDB table consists not only of the values of its member columns, but also values of the table’s primary key are concatenated to the index. I.e. the primary key contents is part of every other index.
Assuming the following table structure:
CREATE TABLE `bets` (
`id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`game_id` int(10) unsigned NOT NULL,
...
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB
Here is the visualization:
If MySQL could use in queries these implicitly added values, it would maybe allow to save some space on listing the primary key columns at the end of an index explicitly. Let’s check various cases.
Row filtering
mysql> EXPLAIN
-> SELECT *
-> FROM bets
-> JOIN games
-> ON games.id = bets.id
-> WHERE bets.user_id = 111
-> AND bets.id > 3476\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: games
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: game.bets.id
rows: 1
Extra:
Both key_len and ref fields indicate that only one four bytes long column is used from the user_id index. MySQL cannot use the primary key values in a secondary index for filtering in WHERE clause.
Sorting with ORDER BY
mysql> EXPLAIN
-> SELECT *
-> FROM bets
-> JOIN games
-> ON games.id = bets.game_id
-> WHERE bets.user_id = 111
-> ORDER BY bets.id DESC\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: games
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: game.bets.game_id
rows: 1
Extra:
Extra only returns Using where, but there is no Using filesort. It means ORDER BY will be optimized using the hidden primary key data from the secondary index.
Aggregating with GROUP BY
mysql> EXPLAIN
-> SELECT *
-> FROM bets
-> JOIN games
-> ON games.id = bets.game_id
-> WHERE bets.user_id = 111
-> GROUP BY bets.id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: games
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: game.bets.game_id
rows: 1
Extra:
Also in this case Extra neither shows Using filesort nor Using temporary, which would indicate no index is used for grouping. Therefore MySQL can optimize GROUP BY on the concatenated primary key values.
Covering index
mysql> EXPLAIN
-> SELECT bets.id
-> FROM bets
-> WHERE bets.user_id = 111\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: bets
type: ref
possible_keys: user_id
key: user_id
key_len: 4
ref: const
rows: 22
Extra: Using index
The query execution plan confirms through Using index that it will only need index contents to return result. MySQL can read and return the hidden primary key values to avoid the additional data lookup.
Summary
In InnoDB tables each entry of a secondary index always contains the copy of the corresponding primary key row. These values may in some cases be used to the benefit of query execution plan:
for ORDER BY on the primary key column(s)
for GROUP BY on the primary key column(s)
when returning the primary key column(s) values in the SELECT list
MySQL cannot use them, however, to optimize filtering in WHERE.
www.dbasquare.com |
5/17/12 10:46 AM
Joins
In SQL, join is a means for combining records from two tables into a single set which can be either returned as is or used in another join. In order to perform the operation a join has to define the relationship between records in either table, as well as the way it will evaluate the relationship. The relationship itself is created through a set of conditions that are part of the join and usually are put inside ON clause. The rest is determined through a join type, which can either be an inner join or an outer join.
The SQL clauses that set the respective join type in a query are [INNER] JOIN and {LEFT | RIGHT} [OUTER] JOIN. As you can see the actual keywords INNER and OUTER are optional and can be omitted, however outer joins require specifying the direction – either left or right.
Examples of queries:
SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.name = 1;
SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1;
Inner join, outer join
The primary difference between the two basic types (each has several subtypes) is in making the decision whether joining of two rows was successful or not, which essentially determines whether the combined row can be returned or not.
Inner joins require that a row from the first table has a match in the second table based on the join conditions. In means that the first query from the example above will only return any rows if files table contains at least one record where owner_id is 1 (has to be equal to users.id by the join conditions and users.id is filtered in WHERE to accept only that one value). Otherwise it will return no rows at all, even if users contains a valid user record. Assuming there are two users, but only one has any files:
mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 1;
+----+--------------+---------+----+----------+------------------+
| id | name | enabled | id | owner_id | filename |
+----+--------------+---------+----+----------+------------------+
| 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc |
+----+--------------+---------+----+----------+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 2;
Empty set (0.01 sec)
Outer joins, on the other hand, consider a join successful even if no records from the second table meet the join conditions (i.e. whether there are any matches or not). In such case outer join sets all values in the missing columns to NULL. The second query from the example will return rows whenever there are matches in users and regardless of the contents of files table.
mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1;
+----+--------------+---------+------+----------+------------------+
| id | name | enabled | id | owner_id | filename |
+----+--------------+---------+------+----------+------------------+
| 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc |
+----+--------------+---------+------+----------+------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 2;
+----+-------------------+---------+------+----------+----------+
| id | name | enabled | id | owner_id | filename |
+----+-------------------+---------+------+----------+----------+
| 2 | Nadzieja Surowiec | 1 | NULL | NULL | NULL |
+----+-------------------+---------+------+----------+----------+
1 row in set (0.00 sec)
Left join, right join
Unlike inner joins, outer joins require that the join direction is specified. Inner join is a symmetrical and bi-directional relationship, which means A JOIN B produces the same result as B JOIN A. That is not true for outer joins, because they accept when for a record in A there is no matching record in B and in such case the reverse operation is impossible (it would have to being with a non-existing record in B). This is the reason why setting the direction is necessary. A LEFT JOIN B finds matches for rows from table A in table B, while A RIGHT JOIN B finds matches for records from B in A.
In practice there is very little or even no real purpose for using RIGHT JOIN and in majority of cases everyone just sticks to using LEFT JOIN if they ever need outer join.
When does the join type matter?
Choosing the appropriate type depends on the logic you are trying to implement.
You have to use inner join when mandatory pieces of information are located in both tables and partial information is considered incomplete or even useless. The case of this could be listing user's files based on the earlier example:
mysql> SELECT *
-> FROM users
-> JOIN files
-> ON files.owner_id = users.id
-> WHERE users.name = 'Nadzieja Surowiec'
-> AND users.enabled = 1;
Empty set (0.00 sec)
The query finds the user's record by in users table and verifies that they are allowed to use the service through the value of users.enabled column and then searches for their files in files table. If there are no matches in either table, the query does not return any result, which is the correct behavior. If outer join was used in this case, a useless partial result could be returned or even incorrect result:
mysql> SELECT *
-> FROM users
-> LEFT JOIN files
-> ON files.owner_id = users.id
-> WHERE users.name = 'Nadzieja Surowiec'
-> AND users.enabled = 1;
+----+-------------------+---------+------+----------+----------+
| id | name | enabled | id | owner_id | filename |
+----+-------------------+---------+------+----------+----------+
| 2 | Nadzieja Surowiec | 1 | NULL | NULL | NULL |
+----+-------------------+---------+------+----------+----------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(1)
-> FROM users
-> LEFT JOIN files
-> ON files.owner_id = users.id
-> WHERE users.name = 'Nadzieja Surowiec'
-> AND users.enabled = 1;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
The application relying on such queries not only would not be able to make anything out of such file information where file data is all set to NULL values, but also it would have to include additional and in fact redundant logic to filter out such results. The row count in this case is correct, as the query returned a single row, but it does not represent how many files the user has, so it is not a valid information that the application could use.
Outer join must be used to perform a join with a table, which holds information that is only optional for the result.
In our example we are working with a query that lists user's files and we already established that the join between users and files has to be inner join. But let's give our users the opportunity to choose a custom icon for any file if they want to. The information could be kept in a separate table called file_icon.
Now, for each listed file we also want to see if user has set a custom icon for that file and return the icon name if they have set it. The icon information is entirely optional, so we want the query to return rows regardless of whether there is an entry for the given file in file_icon or not. Therefore we have to use outer join for this particular task.
mysql> SELECT *
-> FROM users
-> JOIN files
-> ON files.owner_id = users.id
-> LEFT JOIN file_icon
-> ON file_icon.file_id = files.id
-> WHERE users.name = 'Albin Kolano'
-> AND users.enabled = 1;
+----+--------------+---------+----+----------+------------------+---------+------------------+
| id | name | enabled | id | owner_id | filename | file_id | icon_image |
+----+--------------+---------+----+----------+------------------+---------+------------------+
| 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc | 1 | MS-Word-Icon.png |
| 1 | Albin Kolano | 1 | 2 | 1 | stats-201104.xls | NULL | NULL |
+----+--------------+---------+----+----------+------------------+---------+------------------+
The outer join allowed us to grab the complete list of user's files and along with their icons if any were set. If we used inner join instead, the result would be missing the record of stats-201104.xls file.
www.dbasquare.com |
5/17/12 12:47 AM
Training in London next week
I’m going to deliver MySQL Training next week (May 21-24) in London. This is a rare opportunity as I do not personally deliver a lot of Training, especially outside of US. There are still some places left if you want to sign up. You will also get a signed copy of High Performance MySQL 3rd [...]
www.mysqlperformanceblog.com |
5/16/12 7:24 PM
MySQL Workbench 5.2.40 GA Released
The MySQL Developer Tools team is announcing the next maintenance release of it’s flagship product, MySQL Workbench, version 5.2.40.
This version contains more than 28 bug fixes applied over version 5.2.39.
MySQL Workbench 5.2 GA
• Data Modeling
• Query
• Administration
Please get your copy from our Download site.
Sources and binary packages are available for several platforms, including Windows, Mac OS X and Linux.
http://dev.mysql.com/downloads/workbench/
Workbench Documentation can be found here.
http://dev.mysql.com/doc/workbench/en/index.html
Utilities Documentation can be found here.http://dev.mysql.com/doc/workbench/en/mysql-utilities.html
In addition to the new Query/SQL Development and Administration
modules, version 5.2 features improved stability and performance –
especially in Windows, where OpenGL support has been enhanced and the UI
was optimized to offer better responsiveness.
This release also includes improvements to the scripting capabilities of the SQL Editor. You can read more about it in
http://wb.mysql.com/workbench/doc/
For a detailed list of resolved issues, see the change log.
http://dev.mysql.com/doc/workbench/en/wb-change-history.html
If you need any additional info or help please get in touch with us.
Post in our forums or leave comments on our blog pages.
- The MySQL Workbench Team
blogs.oracle.com |
5/16/12 7:10 PM
MySQL Backup & Recovery Essentials
Download PDF Presentation
A hardware, software or human failure can occur at any time. Are you prepared?
Many organizations take a risk of serious data loss and system downtime with inadequate procedures in place to support a disaster recovery. This presentation covers the essentials of MySQL backup and recovery options, identifying the necessary tools for an effective strategy to support data resilience and business continuity for your organization. MySQL has no one single unbreakable backup solution, so it is important to understand the impact of MySQL replication, storage engines, configuration options for durability, hardware configuration and the impact on locking and uptime for the various hot/warm/cold options available.
Short Url: http://j.mp/EM-BandR
Presenter: Ronald Bradford
Schedule: RMOUG QEW – May 2012- Denver, Colorado
effectiveMySQL.com |
5/16/12 7:02 PM
Benchmarking single-row insert performance on Amazon EC2
I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, [...]
www.mysqlperformanceblog.com |
5/16/12 6:55 PM
Register to our live webinar and learn how to protect your sensitive information using Real-Time Dynamic Data Masking
Dynamic Data Masking is an emerging technology that provides real-time data masking in changing environments, typically in production databases.
GreenSQL Dynamic Data Masking enables you to mask or randomize any sensitive information stored on MS SQL Server, MySQL and PostgreSQL databases.
Register Now!
When? Wednesday, May 23, 2012 (9:00 am PDT; 12:00 pm EST; 16:00 pm GMT; 19:00 pm GMT+3:00)
In this webinar, David Maman, GreenSQL Founder and CTO, will explain:
What Real-Time Dynamic Data Masking is?
How to dramatically reduce the risk of a data breach?
How to better comply with regulations?
How to enforce real-time dynamic data masking?
How to provide a proactive security layer around applications, reports and tools?
How to selectively apply masking rules based on end-user identity and access rights?
How to achieve Row Level Security or a Virtual Private Database with Dynamic Data Masking?
Register Now!
www.greensql.com |
5/16/12 5:09 PM
SwRI Chooses TokuDB to Tackle Machine Data for an 800M+ Record Database
Tackling machine data on the ground to ensure successful operations for NASA in space
Issues addressed:
Scaling MySQL to multi-terabytes
Insertion rates as InnoDB hit a performance wall
Schema flexibility to handle an evolving data model
The Company: Southwest Research Institute (SwRI) is an independent, nonprofit applied research and development organization. The staff of more than 3,000 specializes in the creation and transfer of technology in engineering and the physical sciences. Currently, SwRI is part of an international team working on the NASA Magnetospheric Multiscale (MMS) mission. MMS is a Solar Terrestrial Probes mission comprising four identically instrumented spacecraft that will use Earth’s magnetosphere as a laboratory to study the microphysics of three fundamental plasma processes: magnetic reconnection, energetic particle acceleration, and turbulence.
The Challenge: SwRI is responsible for archiving an enormous quantity of data generated by the Hot Plasma Composition Analyzer (HPCA). The device is used to count hydrogen, helium, and oxygen ions in space at different energy levels. These instruments require extensive calibration data and each one is a customized, high precision device that is built, tested, and integrated by hand. SwRI must capture and store all the test and calibration data during the 2-3 week bursts activity that are required for each of the 4 devices.
“During each of these calibration runs, there are several data sources flowing into the server, each one leading to an index in the database,” said Greg Dunn, a Senior Research Engineer at SWRI. “Each packet that arrives gets a timestamp, message type, file name and location associated with it. A second process goes through that data and parses it out – information such as voltage, temperature, pressure, current, ion energy, particle counts, and instrument health must be inserted into the database for every record. This can load the database with up to 400 or 500 inserts per second.”
“Being able to monitor the performance of the instrument and judge the success of the tests and calibrations in near real time is critical to the project,” noted Dunn. “There are limited windows to do testing cycles and make adjustments for any issues that arise. Any significant slip in the testing could cost tens of thousands of dollars and jeopardize the timing of the satellite launch.”
“We started seeing red flags with InnoDB early in the ramp-up phase of the project, as our initial data set hit 400GB,” said Dunn. “Size was the first issue. Each test run was generating around 94 million inserts or around 90GB of data, quickly exceeding the capacity allocated for the program. In addition, as our database grew to 800M records, we saw InnoDB insertion performance drop off to a trickle. Even with modest data streams at 100 records per second, InnoDB was topping out at 45 insertions per second. Being able to monitor these crucial calibration activities in a timely fashion and in a cost effective manner was at risk.”
To keep up with the workload and data set, SwRI considered several options, but they failed to meet program performance and price goals. These included:
Partitioning / Separate Databases – “We considered partitioning, but this can be a challenge to set up and it introduces additional complexity,” said Dunn. “We also looked at putting each calibration into its own database, but that would have made it much more difficult to correlate across different databases.”
Additional RAM – “Increasing the available RAM from 12 GB up to 100 GB was not enough by itself,” claimed Dunn. “We briefly considered keeping everything in RAM, but that was not a realistic or efficient way to address a data set size that was promising to grow to several terabytes by the end of the program.”
The Solution: Once TokuDB was installed, SwRI’s big data management headache quickly subsided. “The impact to our required storage was dramatic,” noted Dunn. “We benefited from over 9x compression. In our comparison benchmarks, we went from 452GB with InnoDB to 49GB with TokuDB.”
There was also a dramatic improvement in performance. “Suddenly, we no longer had to struggle to keep up with hundreds of insertions per second,” stated Dunn. “Our research staff could immediately see whether or not the experiment was running correctly and whether the test chamber was being used effectively. We didn’t have to worry that insufficient data analysis horsepower might lead to downstream schedule delays.”
The Benefits:
Cost Savings: “The hardware savings were impressive,” noted Dunn. “With InnoDB, going to larger servers, adding 100s of GBs of additional RAM along with many additional drives would have easily cost $20,000 or more, and still would not have addressed all our needs. TokuDB was by far both a cheaper and simpler solution.”
Hot Column Addition: “As we continue to build out the system and retool the experiments, flexibility in schema remains important,” stated Dunn. “TokuDB’s capability to quickly add columns of data is a good match for our environment, where our facility is still evolving and sometimes has new sensors or monitors installed that need to be added to existing large tables.”
Fast Loader: “The open source toolset that Tokutek designed to parallelize the loading of the database was very helpful,” said Dunn. “We were able to bring down the load of the database from MySQL dump backup from 30 hours to 7 hours.”
www.tokutek.com |
5/16/12 4:58 PM
MySQL Cluster 7.2 -- Unlimited Possibilities
We've recently seen some great announcements of MySQL Cluster delivering amazing results for both selects and updates. The posts (see related articles below) are full of juicy technical details and proofs, but today I'd like to change the perspective a bit. Let's compare those figures with real-world data and imagine what could be done. Please note that I'm not using any scientific method here, just dreaming about the unlimited opportunities offered by MySQL Cluster today.
MySQL Cluster 7.2.7 -- 1B+ Writes per Minute
Cluster can deliver 1B+ selects per minute with 8 nodes and 1B+ updates per minute with 30 nodes.
Our planet is getting quite populated and interconnected. World population is 7B+ and 2B+ of us are using internet. Let's assume that, due to time-zones, only 1/3 of the total internet population is online at a given time (700M+) and that a single action generates one update and one select on the database.
What kind of services can we offer then?
With such scalability and performance, MySQL Cluster offers endless opportunities to develop something new that can support the exponential growth of the web and offer always-on services to everyone, for example:
Hellos from the world -- a website where everyone can say hello to the world, whenever they want. MySQL Cluster can handle the entire online population in less than 1 minute;
Let's shop together -- a global eCommerce website selling everything with 100% market share. If everyone would buy an item per minute, MySQL Cluster could easily fulfill the needs of the entire internet population with 30 nodes;
Like everything you like -- a like button that can be attached to everything in order to collect statistics on users' favorite things. MySQL Cluster could easily sustain the total online world assuming they'd like 1 thing per minute;
Furthermore, MySQL Cluster could handle updates from all of Zynga's 60M active daily users in 3 seconds or all of Facebook's 900M+ active users in less than a minute. All of that giving you ACID compliance and
synchronous replication to ensure no data loss.
The Oracle MySQL engineering team did a great job with Cluster: let's build the next big thing with it!
Related articles
Where would I use MySQL Cluster?
MySQL Cluster 7.2.7 achieves 1BN update transactions per minute
1 Billion Queries Per Minute - MySQL Cluster 7.2 is GA!
Challenges in reaching 1BN reads and updates per minute for MySQL Cluster 7.2
feedproxy.google.com |
5/16/12 3:59 PM
Percona Server 5.5.23-25.3 released!
Percona is glad to announce the release of Percona Server 5.5.23-25.3 on May 16, 2012 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.5.23, including all the bug fixes in it, Percona Server 5.5.23-25.3 is now the current stable release in the 5.5 series. All of Percona‘s software is open-source and free, all the details of the release can [...]
www.mysqlperformanceblog.com |
5/16/12 3:09 PM
MySQL Cluster 7.2.7 achieves 1BN update transactions per minute
In MySQL Cluster there is a limiting factor in the receive threads that limits our update performance to about 0.5M update transactions per node group per second (usually 2 nodes per node group). In MySQL Cluster 7.2.7 we have removed most of this bottleneck and can now achieve 3x as many update transactions. We're reaching about 1.5M updates per node group per second. On a 30-node configuration we achieved 19.5M update transactions per second which corresponds to 1.17BN updates per minute. This means we achieve almost linear increase of update performance all the way to 30 data nodes.The benchmarks were executed using the benchmark scripts dbt2-0.37.50 available at dev.mysql.com, the benchmark program is the flexAsynch program mentioned in some of my earlier blogs. We used 8 LQH threads per data node.
mikaelronstrom.blogspot.com |
5/16/12 10:21 AM
Automate Database Backups with MySqlDump and PowerShell
One of the Database Administrator's most important duties is performing regular scheduled database backups. To this end, tools like mysqldump are indispensable. Now, with the introduction of PowerShell, database backups can be more automated than ever. www.databasejournal.com | 5/16/12 10:01 AM Stalls during DDL
InnoDB locks the buffer pool mutex and scans the LRU to remove pages when DROP TABLE is done for a table that uses innodb_file_per_table. If you read the source you might notice that it scans the LRU twice, once in buf_LRU_drop_page_hash_for_tablespace and then again in buf_LRU_invalidate_tablespace. Locking the buffer pool mutex and scanning the LRU isn't cheap when the buffer pool is large. It takes ~1 second on servers I use and that is too much as nothing gets done during that time by other threads. I changed InnoDB to remove one of the LRU scans and reduced the stall in half. However that was not good enough. The next step was to avoid any LRU scan during DROP TABLE. Several bugs have been filed for this but the primary ones are bug 51325 and bug 64284. To fix bug 51325 InnoDB in trunk and/or MySQL 5.6 has been changed to scan the flush list rather than the LRU. If your server has 10% dirty pages, then the stall should be reduced by a factor of 10. While discussing bug 51325 with the InnoDB engineering team I worked on my version of a fix that avoids scanning either the LRU or the flush list. I tested the performance of this change by using sysbench with 8 client threads. The test was run in three modes as describe below. The results show that the fix is better at preserving performance during DROP TABLE.no drop table - without DROP TABLE running concurrentlyfast drop table - with the fix from the Facebook patch and another connection running DROP TABLE concurrent with queriesslow drop table -without the fix from the Facebook patch and another connection running DROP TABLE concurrent with queries
www.facebook.com |
5/15/12 9:23 PM
Announcing the Explain Analyzer
The explain statement can be an important tool for understanding how a query is being executed and what you can do to make it run better. Although the output of EXPLAIN is relatively straightforward it can be confusing to inexperienced users or can be mangled by terminal wrapping.
To help with these problems as well as provide a pastebin for MariaDB developers to share explains during development we created The MariaDB/MySQL Explain Analyzer. This tool:
Helps unmangle explains (both vertical and tabular format)
Displays explains in an easy-to-read format.
Highlights and provides explanations for some terms.
Links to KB articles for different optimization techniques.
(Optionally) Allows you to save the explain for sharing.
This is the first release so there are still improvements to make. If you have any suggestions, feature requests or bug reports please let us know.
For more information, please see this KB article.
blog.montyprogram.com |
5/15/12 6:48 PM
Some throttling for PECL/mysqlnd_ms 1.4
Users of MySQL Replication sometimes throttle client requests to give slaves time to catch up to the master. PECL/mysqlnd_ms 1.4, the current development version, features some throttling through the quality-of-service filter and global transaction identifier (GTID). Both the plugins client-side GTID emulation and the MySQL 5.6 built-in GTID feature can be used to slow down PHP MySQL requests, if wanted.
How its done
The replication plugin has a neat feature called quality-of-service filter. If, for example, the quality of service you need from a MySQL Replication cluster is "read your writes", you call mysqlnd_ms_set_qos($connection, MYSQLND_MS_QOS_CONSISTENCY_SESSION). This instructs the plugin to either use a master or a slave, that has replicated your writes already, for all further reads. The plugin takes care of picking an appropriate cluster node. Once you are done with "read your writes" you can relax the service quality to make node selection faster.
By default, MYSQLND_MS_QOS_CONSISTENCY_SESSION will enforce reading from the master. This is undesired as it increases the load on the master. However, before the introduction of global transaction identifiers, there was no safe way of knowing whether a slave had replicated a certain update already or not.
Using either the plugins GTID emulation or the MySQL 5.6 build-in GTID feature, one can reliably check the up-to-date status of a slave using a SQL SELECT statement. GTIDs are some kind of unique transaction sequence numbers. If you know the transaction sequence number of a write operation, you can check whether it has been replicated using a statement like, for example, SELECT GTID_SUBSET('gtid_of_write', @@GLOBAL.GTID_DONE) AS trx_id FROM DUAL. Please, check my previous posts for a more precise description of the GTID feature. This statement will check the replication status and return immediately.
SQL_THREAD_WAIT_AFTER_GTIDS(string gtids [, timeout])
Alternatively, a MySQL 5.6 user can issue SELECT SQL_THREAD_WAIT_AFTER_GTIDS('gtid_of_write') which will block until either the slave has replicated the write in question or the statement times out. This is great to throttle clients and prevent them to send new updates before the slaves have caught up. This is what some throttling is about. You can control which logic PECL/mysqlnd_ms shall use when searching for an up-to-date slave.
Strictly speaking, you could do it in 1.3 already, if using MySQL 5.6, which is not GA yet. GTIDs are opaque to the plugin. The configuration contains a SQL statement to fetch gtid_of_write and one to check if gtid_of_write has been replicated already. You have been free to either use SQL_THREAD_WAIT_AFTER_GTIDS or not for those config settings.
The new bit
New is a wait_for_gtid_timeout setting that can be used with the GTID emulation. If wait_for_gtid_timeout is set, the plugin will poll a slaves state for wait_for_gtid_timeout seconds regardless of the SQL statement configured. The plugin first runs the SQL statement to check if gtid_of_write has been replicated already. If not, it checks if there is time left for another poll attempt, sleeps for second and polls the status again.
All this is done transparently in the background. All the application does is formulate its quality of service needs.
Throttling makes synchronization costs visible
Throttling client requests should not be understood as a hack. MySQL Replication happens to be a lazy primary copy system. All updates must be performed on the primary (master). Synchronization of secondaries (slaves) is lazy. Update transactions are finished once the primary has finished them. An update transaction never waits for secondaries to catch up. This is an easy to implement, often fast and simple approach.
The drawbacks are temporarily stale data on the secondaries and limited gains in availability over a single server.
Clients get confirmation for update transactions as soon as they are finished on the primary. As soon as they are saved on just one server. There is no guarantee that the transaction ever makes it to a secondary. In the unlikely worst case, the primary crashes in an unrecoverable manner and transactions are lost before being replicated. Thus, little gain over a single server. If you don’t want that, you need eager synchronization. This is what MySQL Cluster offers, if you want it. For eager synchronization one needs to slow down updates and wait for one or all replicas to confirm the update. If updating one secondary is all you need, go for MySQL Semisynchronous Replication.
It may not be technically valid comparison, however, the slow down and wait reminds me of throttling. MySQL lets you choose whether you want to do the wait on demand and on the client side (MySQL Replication: lazy synchronization) or built-in to the distributed system (MySQL Cluster: eager synchronization). Putting all this in a matrix shows the wide range of database replication options that MySQL has to offer.
Update: where
Primary copy
Update anywhere
Synch.:when
Eager
(MySQL Semisynchronous Replication - one secondary eager, rest lazy)
MySQL Cluster
Lazy
MySQL Replication
(MySQL Cluster WAN mirror using MySQL Replication - automatic conflict detection)
Happy hacking!
@Ulf_Wendel
blog.ulf-wendel.de |
5/15/12 4:11 PM
Finally we have a MySQL User Group in Sweden!
I have always found it strange that we do not have a MySQL user group in Sweden - this is the country where the MySQL saga started.Therefore I am delighted to announce that since today we have a user group in Sweden and I hope it will be a active one!If you want to join our user group in Sweden, join the group here!
mysql-nordic.blogspot.com |
5/15/12 2:33 PM
(My)SQL mistakes. Do you use GROUP BY correctly?
Often I see a SQL problem solved incorrectly and I do not mean inefficiently. Simply incorrectly. In many cases the developer remains unaware that they aren’t getting the results they were expecting or even if a result is correct, it is only by chance, for example because the database engine was smart enough to figure out some non-sense in a query. In a few posts I will try to disclose some of the more common problems.
Aggregate with GROUP BY
Unlike many other database systems, MySQL actually permits that an aggregate query returns columns not used in the aggregation (i.e. not listed in GROUP BY clause). It could be considered as flexibility, but in practice this can easily lead to mistakes if a person that designs queries does not understand how they will be executed. For example, what values an aggregate query returns for a column that wasn’t part of the grouping key?
mysql> SELECT user_id, id, COUNT(1) FROM bets WHERE user_id = 99 GROUP BY user_id;
+---------+-------+----------+
| user_id | id | COUNT(1) |
+---------+-------+----------+
| 99 | 12857 | 12 |
+---------+-------+----------+
id column is a surrogate key and carries a unique value in each and every row, so does 12857 in the query result make any sense, then? Why 12857 and not any other value when user 99 also has eleven other rows in the table? Unless id only had a single value throughout all user’s rows or I actually wanted to see a single randomly chosen value there, the result is probably not what I needed. GROUP BY does not care about columns that are not part of the aggregate key, so if your query requests them, you should not rely too much on the values they return.
At the same time MySQL comes with a number of aggregate functions. They can process data from the individual rows while GROUP BY is executing. For example I can aggregate by user_id, but also remember and then list all the values from id column:
mysql> SELECT user_id,
GROUP_CONCAT(id) _id,
COUNT(1)
FROM bets
WHERE user_id = 99
GROUP BY user_id;
+---------+-------------------------------------------------------------+----------+
| user_id | _id | COUNT(1) |
+---------+-------------------------------------------------------------+----------+
| 99 | 2857,2856,2858,2851,2852,2855,2853,2854,3201,3200,3262,3261 | 12 |
+---------+-------------------------------------------------------------+----------+
Such result probably makes more sense to my application, because it received the complete information rather than only a random piece.
Sorting an aggregation
What if I needed to grab a few users that most recently made an action based on an activity table? The first thing that comes to my mind:
mysql> SELECT user_id, bet_date FROM bets GROUP BY user_id ORDER BY bet_date DESC LIMIT 3;
+------------+---------------------+
| user_id | bet_date |
+------------+---------------------+
| 99 | 2009-12-08 22:51:38 |
| 93 | 2009-11-03 12:39:07 |
| 95 | 2009-09-29 09:23:07 |
+------------+---------------------+
It was easy, wasn’t it? However, is the result correct?
ORDER BY is applied after GROUP BY, so it operates on a set that has been already aggregated and not on the individual rows. And how did the aggregation work? For each user GROUP BY collapsed a number of values from bet_date into a single date. This way user 99 received “2009-12-08 22:51:38″. But why did it get that particular value? Was it the most recent date among the user’s records as per ORDER BY caluse? Let’s examine the raw data:
mysql> SELECT user_id, bet_date FROM bets WHERE user_id = 99 LIMIT 5;
+------------+---------------------+
| user_id | bet_date |
+------------+---------------------+
| 99 | 2009-12-08 22:53:20 |
| 99 | 2009-12-08 22:53:09 |
| 99 | 2009-12-08 22:53:37 |
| 99 | 2009-12-08 22:51:38 |
| 99 | 2009-12-08 22:51:58 |
+------------+---------------------+
Just by looking at these rows it becomes clear that “2009-12-08 22:51:38″ could be anything except the most recent entry, so the answer is ‘no’. So why was it picked for the result? Because MySQL did not evaluate the sort order at the time it was building the aggregation. In fact “2009-12-08 22:51:38″ came from the very first row that MySQL saw for user 99. In different circumstances the same query on the same data set could return a different value there.
This means the original query did not return the result I expected, because ORDER BY was applied on a limited set of somewhat random values picked by GROUP BY and never saw most values from bet_date.
How to make it work?
I can rely on the aggregate functions again, which gives me some access to the raw values:
mysql> SELECT user_id,
MAX(bet_date)
FROM bets
GROUP BY user_id
ORDER BY MAX(bet_date) DESC
LIMIT 3;
+------------+---------------------+
| user_id | MAX(bet_date) |
+------------+---------------------+
| 91 | 2010-05-22 18:49:41 |
| 92 | 2010-05-22 17:18:50 |
| 88 | 2010-05-22 15:14:42 |
+------------+---------------------+
This looks better.
Now, is there a way to learn some other column’s value for these most recent entries? The following query cannot work correctly. Why? For the reasons that were covered in the first part of this post. game_id will carry any value from the aggregated set, so a value that may or may not be related to the row that has the most recent bet_date:
mysql> SELECT user_id,
game_id,
MAX(bet_date)
FROM bets
GROUP BY user_id
ORDER BY MAX(bet_date) DESC
LIMIT 3;
+---------+---------+---------------------+
| user_id | game_id | MAX(bet_date) |
+---------+---------+---------------------+
| 91 | 832 | 2010-05-22 18:49:41 |
| 92 | 831 | 2010-05-22 17:18:50 |
| 88 | 898 | 2010-05-22 15:14:42 |
+---------+---------+---------------------+
A query to solve such problem may need to be a lot more complex and will rarely be very efficient. That is why often it is better to create a summary table for easy querying, instead of running an aggregate query every time such information is required. Here is the example of a query that returns correct result:
mysql> SELECT _d.user_id,
bets.game_id,
_d.bet_date
FROM (SELECT user_id,
MAX(bet_date) bet_date
FROM bets
GROUP BY user_id
ORDER BY MAX(bet_date) DESC
LIMIT 3
)
_d
JOIN bets
USING (user_id, bet_date)
ORDER BY _d.bet_date DESC;
+---------+---------+---------------------+
| user_id | game_id | bet_date |
+---------+---------+---------------------+
| 91 | 1004 | 2010-05-22 18:49:41 |
| 92 | 1004 | 2010-05-22 17:18:50 |
| 88 | 1004 | 2010-05-22 15:14:42 |
+---------+---------+---------------------+
Summary
Using GROUP BY can become tricky beyond doing simple aggregations. Due to the relaxed restrictions in MySQL on how the clause can be used in a statement, it is easy to create queries, which do not work correctly. At the same time database does not issue any warnings of a possible problem, so it is entirely up to you to verify whether results are correct and meet your expectations.
www.dbasquare.com |
5/15/12 1:59 PM
Scale differences between OLTP and Analytics
In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I reviewed the differences between OLTP and Analytics databases.Scale challenges are different between those 2 worlds of databases.Scale challenges in the Analytics world are with the growing amounts of data. Most solutions have been leveraging those 3 main aspects: Columnar storage, RAM and parallelism.Columnar storage makes scans and data filtering more precise and focused. After that – it all goes down to the I/O - the faster the I/O is, the faster the query will finish and bring results. Faster disks and also SSD can play good role, but above all: RAM! Specialized Analytics databases (such as Oracle Exadata and Netezza) have TBs of RAM. Then, in order to bring results for queries, data needs to be scanned and filtered, a great fit for parallelism. A big data range is divided into many smaller ranges given to parallel worker threads that each performs his task in parallel, the entire scan will finish in a fraction of the time.In the OLTP, scale challenges are in the growing transaction concurrency throughput and… growing amounts of data. Again? Didn't we just say growing data is the problem of Analytics? Well, today’s OLTP apps are required to hold more data to provide a larger span online functionality. In the last couple of years OLTP data archiving was changed dramatically. OLTP data now covers years and not just days or weeks. Facebook recently launched its “time line” feature (http://www.facebook.com/about/timeline), can you imagine your timeline ends after 1 week? Facebook’s probably world’s largest OLTP database holds data of a billion users for years back. Today all data is required anywhere anytime, right here, right now, online. Many of today’s OLTP databases go well beyond the 1TB line. And what about transaction concurrency throughput? Applications today are bombarded by millions of users shooting transactions from browsers, smartphones, tablets… I personally checked my bank account 3 times today. Why? Because I can…What can be done to solve OLTP scale challenges?In my next post let's start answering this question with understanding why solutions proposed for the Analytics are limited in the OLTP, and start reviewing relevant approaches.Stay tuned, subscribe, get involved!
database-scalability.blogspot.com |
5/15/12 7:08 AM
Why do threads sometimes stay in ‘killed’ state in MySQL?
Have you ever tried to kill a query, but rather than just go away, it remained among the running ones for an extended period of time? Or perhaps you have noticed some threads makred with killed showing up from time to time and not actually dying. What are these zombies? Why does MySQL sometimes seem to fail to terminate queries quickly? Is there any way to force the kill command to actually work instantaneously? This article sheds some light on it.
Threads and connections
MySQL uses a separate thread for each client connection. A query sent to MySQL is handled by a thread that was previously associated with the connection over which the query arrived. Anyone with sufficient privileges can see the list of currently active threads, along with some additional details, by running SHOW PROCESSLIST command, which returns a table-like view where each connection becomes a separate row:
+-----------+-------------+-------------------+--------+-------------+----------+-------------+---------+
| Id | User | Host | db | Command | Time | State | Info |
+-----------+-------------+-------------------+--------+-------------+----------+-------------+---------+
| 827044892 | production | 10.0.1.100:48596 | proddb | Sleep | 1 | | NULL |
| 827044893 | production | 10.0.1.100:39181 | proddb | Sleep | 1 | | NULL |
| 827044894 | production | 10.0.1.100:48598 | proddb | Sleep | 1 | | NULL |
| 827044895 | production | 10.0.1.100:39183 | proddb | Sleep | 1 | | NULL |
More advanced techniques of dealing with this information are described in the posts titled “Anohter way to work with MySQL process list” and “How to selectively kill queries in MySQL?”.
Any running query or any existing connection from the list can be terminated by using KILL command.
What actually happens when you run KILL?
The command sytnax is KILL [QUERY | CONNECTION] , where thread_id is the value from the first column of the process list output. The optional argument determines whether only running query or should the entire session be terminated. It defaults to the latter, so specifying CONNECTION is not required.
Running the command doesn’t actually do anything except for setting a special flag inside the selected thread. Therefore the kill operation doesn’t happen synchronously with the corresponding request. In many cases it takes some time for a thread or a query to stop. The flag is checked at various stages of statement execution. When it happens exactly, or how frequently, depends on the work a thread is actually doing.
For example:
During ALTER TABLE it is checked before each block of rows is read from the original table while MySQL is rewriting data into a new temporary table. In the process, the temporary table is deleted and the original structure remains unchanged.
For UPDATE and DELETE there is also an additional check after each updated or deleted row. If a query managed to make any changes prior to noticing the request to terminate, they have to be rolled back. Note: if a table’s storage engine does not support transactions (e.g. MyISAM), the changes cannot be rolled back, so the operation will result in partial update!
SELECT also checks it after reading a block of rows.
Of course, these were just basic examples and the behavior can be different in different situations. It is even possible that some queries will be unkillable in certain circumstances. A case of this happened once while reading from INFORMATION_SCHEMA.INNODB_BUFFER_POOL_PAGES_INDEX table. The database was having performance problems at the time and the query execution basically stopped on a lock somewhere deep inside InnoDB and it never got back to the point where the flag value could be checked again.
Threads stay in killed state for a long time. What does it mean?
Actually, there can be two different cases. If KILL was issued, there would be killed in the process list. KILL QUERY doesn’t kill a connection, but rather it only stops a running query within a connection, so in that case query end text may appear instead.
What if you see something like this?
+----+------+-----------+------+---------+------+----------+------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+------+-----------+------+---------+------+----------+------------------------------------+
| 10 | root | localhost | NULL | Query | 0 | NULL | show processlist |
| 14 | root | localhost | NULL | Killed | 27 | Updating | update testdb.sometable set cc=sqrt(id) |
+----+------+-----------+------+---------+------+----------+------------------------------------+
It can either be an effect of a bug (e.g. Bug #52528), or more likely it means the database is performing some work internally to clean up after a task that was terminated.
There is of course no easy way to confirm if this is a bug. So in order to figure it out, you should rather look for evidence that it is not a bug. That what you see is just the effect of a standard operation, which MySQL has to perform to clean up after a query or a transaction.
Probably the most common reason for a thread to stay with either killed or query end for a longer period of time is waiting for a transaction rollback on InnoDB tables. This sometimes can take a lot of time to complete, especially when hundreds of thousands or millions of changes have to be removed.
How to verify that?
Check is the output of SHOW ENGINE INNODB STATUS\G. It can simply print the information if there is a rollback currently in progress:
---TRANSACTION 0 10411, ACTIVE 28 sec, process no 15506, OS thread id 140732309711184 rollback
mysql tables in use 1, locked 1
ROLLING BACK 7585 lock struct(s), heap size 751600, undo log entries 798854
MySQL thread id 14, query id 206 localhost root end
update testdb.sometable set cc=sqrt(id)
The example shows how easily the information can be found. If a thread is marked with killed, or with query end, and the InnoDB engine status reports a rollback for the same thread, just wait until it ends.
What if it isn’t that?
It can be related to removing some temporary table from disk. An ALTER TABLE may need to discard a very large temporary table, while large file removal on some filesystems (e.g. ext3 or ext4) can be rather slow, so it may need a few seconds or sometimes even longer than that. A temporary table can also be created by any DML statement, but usually not nearly as big in size.
It should not usually be necessary as even in the most extreme cases deleting a file should not take more than ten or twenty seconds, but under heavy I/O load it could be much longer, so is there any way to see whether any temporary table were created or not?
In Percona Server or MariaDB you check the contents of INFORMATION_SCHEMA.GLOBAL_TEMPORARY_TABLE and INFORMATION_SCHEMA.TEMPORARY_TABLE. It will only work for manually established temporary tables with CREATE TEMPORARY TABLE statement, not for those created implicitly by MySQL to execute complex queries.
mysql> select * from INFORMATION_SCHEMA.GLOBAL_TEMPORARY_TABLES;
+------------+--------------+------------+--------+-------------------+------------+----------------+-------------+--------------+-------------+-------------+
| SESSION_ID | TABLE_SCHEMA | TABLE_NAME | ENGINE | NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | INDEX_LENGTH | CREATE_TIME | UPDATE_TIME |
+------------+--------------+------------+--------+-------------------+------------+----------------+-------------+--------------+-------------+-------------+
| 28051622 | test | tbl | InnoDB | #sql8c3_1ac08a6_0 | 1265 | 51 | 65536 | 0 | NULL | NULL |
+------------+--------------+------------+--------+-------------------+------------+----------------+-------------+--------------+-------------+-------------+
SESSION_ID is the same as the thread identifier in the process list, so you can connect any killed threads to their temporary tables.
The other type of temporary tables cannot be easily traced. Sometimes it’s possible to spot the information in the SHOW PROCESSLIST output:
mysql> show processlist;
+----------+------+-----------+------+---------+------+----------------------+---------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----------+------+-----------+------+---------+------+----------------------+---------------------------------------------------+
| 1934 | root | localhost | test | Killed | 4 | Copying to tmp table | INSERT INTO test.tbl SELECT * FROM test.testtable |
..
Sometimes you can check what temporary files the instance keeps open:
server ~ # lsof -c mysqld | grep \#sql
mysqld 5626 mysql 138u REG 253,1 1024 43843585 /vol/vol1/mysql/#sql_95fa_0.MYI
mysqld 5626 mysql 139u REG 253,1 227262885 43843605 /vol/vol1/mysql/#sql_95fa_0.MYD
The colored value is the file’s size. These may help you to make an assumption that database could be removing a temporary table.
Conclusions
From time to time it may be normal to see a database thread that was killed, but didn’t die immediately, or a query that can’t seem to end. The real operation is not be performed synchronously with the kill command and afterwards MySQL may still need some time to clean things up properly.
No matter what the real cause is, it is impossible to get rid of such hanging threads without a database restart. They should, however, simply be allowed to go away on their own.
There is also no way of forcing a kill that would execute instantly like kill -9 in Unix systems.
www.dbasquare.com |
5/15/12 1:33 AM
Under the Hood with Connector/J's "rewriteBatchStatements=true"
My old post (http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for) on the performance gains from batch rewritten statements gets regurgitated in various forums, and conference sessions, and often people miss the nuances of it.
Under the hood, what is happening with this feature is the following:
(1) The driver attempts to detect that the SQL being prepared is an INSERT. We (on purpose) don‘t ship a full-fledged parser in the driver, so this works 95% of the time. For the other 5%, you‘re out of luck unless you can simplify your query text.
(2) If the statement is an INSERT, the driver attempts to determine if it can be rewritten as a multi-value INSERT. From the code itself, the conditions are:
// Needs to be INSERT, can‘t have INSERT … SELECT or
// INSERT … ON DUPLICATE KEY UPDATE with an id=LAST_INSERT_ID(...)
If not, the driver will determine whether it is more cost-effective round-trip-wise to enable multistatements on the connection, and send the batch as chunks of semicolon-separated statements. This means that any kind of batch DML can benefit from this feature, just that INSERTs benefit more because of the way MySQL handles multi-value INSERTs.
(3) The driver will send “chunks” that are as close to max_allowed_packet size in bytes as possible, until the batch has been completely sent.
(4) The driver does not re-order the batch
(5) You can use Statement.getGeneratedKeys() with this feature
(6) You can use set*Stream() with this feature
(6) You can use server-side prepared statements with this feature as well, and if the batch contains streams (CLOBs or BLOBs), they will be streamed to the server, using less memory.
As you can see, this is almost a win-win feature to enable, especially if your application or ORM can do DML in batches (most do). The only reason it‘s not enabled by default is because of (1), maybe as this feature sees more use, we can feel safer enabling it by default.
www.jroller.com |
5/15/12 1:21 AM
Speeding up EC2 work by using AWS tools and scripts to bypass the AWS management console
Believe me managing EC2 instances is not as simple or magical as marketers would like for you to believe. The cloud gets complicated when it gets bigger. EC2 really only enables a person to ignore power, network layout (which is bad), and getting more servers that end up costing more then actual servers fast.Things that EC2 is missing to make life easier for the developer:Ability to update all servers with packages and code. Natively they do not support the ability to push files or install new software packages to server groups. Thus install cluster-it and puppet and write your own deploy program.Server names and the EC2 AWS management console do not match. Everything is referenced by instance ids. The interface does not allow one to launch many instances in a named pattern so you have to go back and sync up the instance with the internal named used in the app. Syncing up the names is very tedious and time consuming process through the console.Assigning EBS volumes is a pain in the ass as well. Essentially you need to assign them one by one, where each add takes more then 2 mins todo this is not good time spent. For instance it took me roughly 20 mins to attach 8 63GB EBS volumes to a single server.Amazon is perfectly aware of these limitations-and do not hide from it. They are going after something bigger. They are providing a platform. There is an API for everything you need to make your work easier. Now the pain is to learn the API and use it in your favor. There are companies that built a business on making a better interface for the AWS console but getting it done yourself is cheaper.My personal mantra is to automate things that I have to do more then once. Anytime that I deploy new instances, I take the private IP add it to DNS, make an API call through ec2-describe-instances, find the instance id and update the name through ec2-create-tags. This solves the problem that I have with mapping instance ids to my internal name which the app uses. For instance: #!/usr/bin/perl -w ##use strict;use Data::Dumper;open(HOSTS, "){ my ($ip, $hostname, undef) = split(/\s+/, $_); $hosts->{$ip} = $hostname;} open(FH, "/opt/aws/bin/ec2-describe-instances -C cert.pem -K x509.pem --region us-west-1|") or die($!);while(){ if ($_ =~ /^INSTANCE\t(.*)/){ my (@fields) = split(/\s+/, $1); # 0 - instance # 1 - ami # 2 - public dns # 3 - private dns # 4 - state # 5 - ?? # 6 - ?? # 7 - instance type # 8 - date created # 9 - DC # 10 - ?? # 11 - monitoring state # 12 - public ip # 13 - private ip # 14 - ebs # 15 - ?? if ($fields[4] eq 'running'){ my $role; my $hostname = $hosts->{$fields[13]}; if (!$hostname) { print "$fields[13] is not in the hosts file skipping..\n"; next; } if ($hostname =~ /^job/){ $role = 'gearman-worker'; } if ($hostname =~ /^gearman/){ $role = 'gearman-queue'; } if ($hostname =~ /^www/){ $role = 'webserver'; } if ($hostname =~ /^memc/){ $role = 'memcache'; } if ($hostname =~ /^db/){ $role = 'database'; } if ($hostname =~ /^dbshard/) { $role = 'database-shard'; } if (!$role){ print "$hostname does not have a role\n"; $role = 'other'; } system("./aws/bin/ec2-create-tags -C cert.pem -K x509.pem --region us-west-1 ".$fields[0] ." --tag Name=$hostname --tag Role=$role"); } }}Now to attach disks to an instance, that I am upgrading or re-purposing I wrote a quick script that describes the input instance after translating from my internal name to instance id. Calculates the size of each disk and attaches said disks. For instance: #!/usr/bin/perl -w##use strict;use Data::Dumper;use POSIX qw(ceil); print "Enter Hostname: ";my $hostname = <>;chomp($hostname);my $cmd = './aws/bin/ec2-describe-instances -C cert.pem -K x509.pem --region us-west-1 --filter="tag-key=Name" --filter="tag-value=' . $hostname . '"'; open(FH, "$cmd|") or die ("Awesome death: $!\n"); my $instance = "";my $lastDisk = "";my $diskCount = "";while(){ if($_ =~ /^INSTANCE\t(.*)/){ my (@fields) = split(/\s+/, $1); $instance = $fields[0]; print "Instance=$instance\n"; } if($_ =~ /^BLOCKDEVICE\t(.*)/){ my (@fields) = split(/\s+/, $1); $diskCount++; $lastDisk = $fields[0]; print "$lastDisk\n"; }} print "How many disks you would like to add: ";my $totalAddDisks = <>;chomp($totalAddDisks); print "You picked $totalAddDisks\n";print "What is the total size of the Raid0 Array in GB: ";my $totalSize = <>;chomp($totalSize); print "You picked $totalSize GB\n";my $sizeperdisk = ceil($totalSize/$totalAddDisks);print "The size per disk: $sizeperdisk\n"; $lastDisk =~ /sd(\S)/;my $lastDeviceLetter = $1;my @devicesavail = ($lastDeviceLetter .. 'z'); for(my $i = 1; $i <= $totalAddDisks; $i++){ $cmd = "./aws/bin/ec2-create-volume --size $sizeperdisk --region us-west-1 --availability-zone us-west-1c -C cert.pem -K x509.pem"; my $ret = `$cmd`; my (@output) = split(/\s+/, $ret); $cmd = "./aws/bin/ec2-attach-volume --region us-west-1 -C cert.pem -K x509.pem $output[1] --instance $instance --device /dev/sd$devicesavail[$i]"; $ret = `$cmd`; if ($ret =~ /attaching/){ print "All good do the next one\n"; } else { die("Did not work\n"); }}These are rough and dirty scripts that get the job done for my environment. The end goal when given time is to turn these scripts into a package talking over httpd that makes life easier when working in EC2. Using these two script have reduced the management time from 1 hour per server upgrade to a few minutes.
mysqldba.blogspot.com |
5/14/12 11:58 PM
WordPress on S3: the beauty of simplicity
My first computer program was written almost quarter a century ago on a BK-0010 computer. It was very simple: the program asked the user to enter their name and then greeted the user using the entered name, like “Hello, Artem!”. I was fascinated. A couple of lines written in Vilnius BASIC transformed a piece of metal and silicon into a considerate thing that cared about a person’s name enough to remember it :-). Of course, the first experience doesn’t represent the day-to-day routine of software development, but the moments when I see a couple of lines making an amazing transformation still enchant me, and remind me why I’ve been writing code all this time.
I’ve just experienced this very same first-time feeling as we’ve released Yapixx – a picture sharing web application using the cloud storage. The most amazing thing about Yapixx is that we wrote very little code to make it happen: most of its functionality is provided by WordPress, which by the way we didn’t modify at all.
On one hand Yapixx is just WordPress, enhanced with plugins and configured to provide good picture sharing experience.
On the other hand, Yapixx has gone where WordPress could not go before – Yapixx runs completely on top of Amazon S3, using the enormous power of S3 to make serving the users’ pictures highly scalable and storing all data extremely durable.
As an engineer I’ve been on the never ending quest of finding new ways to do more with less. How do I write less code and provide more functional solution? How do I empower my customers to accomplish more with applying very little effort (or preferably no effort at all)? Yapixx is one of those gems – it shows how unbelievably simple it is to run a beautiful website on top of Amazon S3: you don’t need to learn new APIs, you don’t have to know how to write code, you don’t even have to know that you are relying on MySQL + ClouSE to store data reliably and securely in the cloud!
Yapixx is not a toy application: it’s a fully functional ready-to-run WordPress on Amazon S3. With a few clicks Yapixx can be transformed into anything WordPress can do, while storing all website content in Amazon S3. Just continue doing what you set to do with your Web site – to build a beautifully powerful representation of you, your company, and your cause.
Amazon S3 is a very powerful service designed for building one-of-a-kind massively distributed applications. This is its strength but it’s also its weakness: if you are not building one-of-a-kind massively distributed application, the low-level vendor-specific APIs and eventual consistency guarantees are just the unnecessary complexities that you pay for, but don’t use. WordPress is an example of how to make Amazon S3 a true cloud storage utility service that can be easily used by millions. It’s a people-oriented (as opposed to technology-oriented) approach to building a distributed system – as a constellation of beautiful websites powered by creativity and uniqueness of individuals, like the Internet itself is!
‘Nuff said :-). Get your own WordPress on S3 for free now.
Artem
www.oblaksoft.com |
5/14/12 10:00 PM
Announcement of Percona XtraDB Cluster 5.5.23
Our previous GA release of Percona XtraDB Cluster caused a lot of interest and feedback. I am happy to announce next version Percona XtraDB Cluster 5.5.23, which comes with bug fixes and improvements.
List of changes:
Fixes merged from upstream (Codership-mysql)
Support for MyISAM, now changes to MyISAM tables are replicated to other nodes
Improvements to XtraBackup SST methods, better error handling
New SST wsrep_sst_method=skip, useful when you start all nodes from the same sources (i.e. backup)
Ability to pass list of IP addresses for a new node, it will connect to the first available
Binaries are available from downloads area or from our repositories.
For this release we will provides binaries for Ubuntu 12.04, they are coming soon.
If you want to know more how to migrate to XtraDB Cluster, we will be giving a free webinar on June 6th.
This is an General Availability release. We did our best to eliminate bugs and problems during alpha and beta testing release, but this is a software, so bugs are expected. If you encounter them, please report to our bug tracking system.
Links:
We provide tar.gz and RPM binaries for RedHat (CentOS, Oracle Linux) 5 and 6, and Debian packages.
Downloads: http://www.percona.com/downloads/Percona-XtraDB-Cluster/
Documentation
Codership Wiki
General Discussion group
Launchpad project
Bug reports
www.mysqlperformanceblog.com |
5/14/12 8:33 PM
Tokutek Welcomes Gerry Narvaja!
We are excited to have Gerry Narvaja start today at Tokutek! Gerry has spent more than 25 years in the software industry, most of them working with databases for different kinds of applications, from embedded to large-scale web products. Gerry worked first at MySQL, and then Sun Microsystems supporting the Sales teams. In 2008 he transitioned into being a Senior MySQL DBA. Gerry graduated as an Electronic Engineer from I.T.B.A (Instituto Tecnológico de Buenos Aires) and has an M.B.A. from Universidad del Salvador in collaboration with S.U.N.Y.A (State University of NY at Albany).
Gerry enjoys helping users to solve complex database production issues. For almost a year he has been co-hosting the popular MySQL Community podcast, OurSQL, which was given the MySQL Community Contributor of the Year 2012 award at the recent Percona MySQL Users Conference. Gerry and Martín Farach-Colton, our CTO, will also be speaking next month at the first ever Latin American MySQL / MariaDB Conference in Argentina.
Please feel free to drop Gerry a line at gerry@tokutek.com with your toughest MySQL and MariaDB issues!
www.tokutek.com |
5/14/12 4:09 PM
Hopper for InterBase, 1.0 released
Hopper for InterBase, 1.0 released
[2012-05-09]
Upscene Productions is proud to announce the first release of our new product "Hopper", a Stored Code Debugger for InterBase.
Thanks to the feedback of people who downloaded the betas, we were able to improve Hopper.
Hopper is currently available for InterBase and Firebird, the MySQL Edition will follow shortly.
More information available at the Hopper page, download your copy today via our downloads page, pricing information is available.
www.upscene.com |
5/14/12 3:44 PM
Challenges in reaching 1BN reads and updates per minute for MySQL Cluster 7.2
In an earlier blog we've described the general high-level idea of how to achieve 10X better performance for MySQL Cluster 7.2 compared to MySQL Cluster 7.1.Naturally the development is never as straightforward as the high-level view looks like. In this blog I'll mention a few of the most important roadblocks on the path to improved performance of MySQL Cluster 7.2 that we met and resolved.Initially when we increased the number of LQH threads from 4 to 16 we only saw scaling to 8 LQH threads and we saw no scaling in going to 16 LQH threads. This was very puzzling since we don't really have any mutexes that should be an issue. However we looked into the mutexes that we had and managed to decrease the number of conflicts on the send mutexes by a factor of 15. This did however not improve performance at all.Next we noted using oprofile that there was a few functions that for some reason 50% of the CPU time was spent. This was quite surprising and given that the exactness of those measurements is not always 100%, I was very suspicious about those numbers. Eventually however the reason dawned on me.The reason was that I had some cache lines that was too often updated. This lead to that some instructions took several microseconds to execute since all threads were serialised on updating this cacheline.The first such instance was a piece of code used to check whether send buffers were overloaded. In case the send buffer is more than 75% overloaded we start rejecting client requests to ensure that already ongoing requests are able to complete. This is accomplished using a bitmap with one bit per node we're communicating with. This bitmap is obviously global and this was updated every time we made a remote send to another node. This was obviously quite unnecessary to update it every time, it's enough to update when the state changes, so a simple if-statement resolved that problem.The next problem was even harder to understand how it could be an issue. It turned out that the problem resided in our crash information subsystem. We have a macro called jam() (Jump Address Memory, an acronym we inherited from the AXE system once upon a time). This macro inserts the line number we're currently executing together with sometimes the block number we're executing. When there was only one thread in the MySQL Cluster data nodes then this data structure was global and shared by all others.With the move to multithreaded architecture we changed this to be a data structure per thread such that we can get detailed information on each thread what it did before any crash.Most blocks are only executing in one thread and was fairly straightforward to change this. However in one case we have a code path which is used by all TC threads to ask the distribution handlers which nodes and threads that contain the data for a certain partition of the MySQL Cluster. The distribution handler thus is one block called from many threads and thus we needed to use different jam's dependent on which thread that called this function. When this wasn't done then this code showed up as another bottleneck since many threads tried to update the same cachelines again.With those fixes we were able to reach very good numbers on a single node with up to 16 LQH threads. However we saw that the risk of getting out of send buffer memory had severely increased due to the great increase of threads in the data node. The threads communicate using a lock-free scheme, however this means that there needs to be dedicated memory available for each two threads that communicate. Also the code doesn't always use the send buffer memory in the most efficient manner to speed up communication. This meant that we needed to do something about send buffer memory handling in order to make the data nodes as stable as before. We found three points in the code where we needed to pack send buffer memory, non of these were part of the normal code path but were vital to ensure that we packed things in cases when we got close to run out of send buffer memory. We also went through all send buffer memory configuration defaults and parameters and made them more appropriate to also handle larger data node configurations.As a final step towards getting single node performance working really good we also made sure that all data structures that were global were properly aligned on cacheline sizes.Putting the code to the test in a distributed environment also revealed a few new points to handle. At first we discovered that the free list of connections to the data node in an API had an interesting impact on the balance of the use of TC threads. For some reason, still unclear exactly how, a LIFO queue here had the impact that we used some TC threads up to 10x more than other TC threads which obviously made for very bad scalability with many TC threads. The solution was simple however, a quick change to a FIFO queue and the problem was no longer there.The next problem was yet one more imbalance, this time the imbalance was on LQH threads. The imbalance only showed up on very large clusters. This time the imbalance came from the manner in which we distribute rows into partitions. In order to make on-line reorganisation of tables very efficient we divide the table into a number of virtual partitions using a hashmap, then the virtual partitions are mapped to a real partition. Previously the hashmap always created 240 virtual partitions which was quite sufficient with 4 LQH threads, but not when moving to 16 LQH threads. So we changed to using 3840 virtual partitions instead.Actually the choice of 240 and 3840 is intricate here. 3840 is equal to 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 3 * 5. This means that if using 12 LQH threads (2 * 2 * 3) then the number of nodes in the cluster should either be on the form 2**n or 5 * 2**n. If not things will still work fine, but load will be slightly unbalanced since the real partitions will contain different numbers of virtual partitions. Uisng 16 LQH partitions the numbers of nodes should be on either of the forms 2**n, 3*2**n, 5*2**n or 3*5*2**n where n is less than or equal to 4. Thus we get from this calculation that 30 nodes is better than 32 nodes if using 16 LQH threads since it brings about a more even distribution of rows in the cluster.With these changes the performance of reads in a distributed environment was quite good and was providing very good scalability. However updates still caused issues.The problem for updates was that the receiver thread handling signals between the nodes in the same node group was overloaded. There was simply too many signals to handle. Much of this was due to some work that was left as a TODO after the 7.0 release since it wasn't an issue in 7.0, but now with the higher throughput it has become an issue.So the solution was to properly implement packing of signals such that several commit messages were packed together and similarly for the commit acknowledge from the API. These straightforward changes decreased the load on the receiver thread to a third and made it possible to push through 1.5M updates per second per node group. It is still possible that this becomes a bottleneck but it should be extremely unusual for a real-world application to reach this state.With all these changes implemented we managed to scale update and read performance linearly up to 30 nodes.
mikaelronstrom.blogspot.com |
5/14/12 3:33 PM
Snapshot Isolation Level in SQL Server - What, Why and How Part 2
Introduced in SQL Server 2005, Snapshot Isolation levels improve performance, however, there are some caveats to consider when using this feature. Arshad Ali discusses the two new snapshot based isolation levels in detail, their pros and cons and how they differ from each other. www.databasejournal.com | 5/14/12 10:01 AM Impact of foreign keys absence on replicating slaves
In this post I describe what happens when a slave's Foreign Key setup is different from that of the master. I'm in particular interested in a setup where the slave has a subset of the master's foreign keys, or no foreign keys at all. I wish to observe whether integrity holds.
Making the changes
Which foreign keys do we have and how do we drop them? If you want to do this by hand, well, good luck! Fortunately, common_schema provides with quite a few handy views and routines to assist us. Consider viewing the existing foreign keys on sakila:
master> SELECT create_statement FROM common_schema.sql_foreign_keys WHERE TABLE_SCHEMA='sakila';
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| create_statement |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ALTER TABLE `sakila`.`address` ADD CONSTRAINT `fk_address_city` FOREIGN KEY (`city_id`) REFERENCES `sakila`.`city` (`city_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`city` ADD CONSTRAINT `fk_city_country` FOREIGN KEY (`country_id`) REFERENCES `sakila`.`country` (`country_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`customer` ADD CONSTRAINT `fk_customer_address` FOREIGN KEY (`address_id`) REFERENCES `sakila`.`address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`customer` ADD CONSTRAINT `fk_customer_store` FOREIGN KEY (`store_id`) REFERENCES `sakila`.`store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film` ADD CONSTRAINT `fk_film_language` FOREIGN KEY (`language_id`) REFERENCES `sakila`.`language` (`language_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film` ADD CONSTRAINT `fk_film_language_original` FOREIGN KEY (`original_language_id`) REFERENCES `sakila`.`language` (`language_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film_actor` ADD CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `sakila`.`actor` (`actor_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film_actor` ADD CONSTRAINT `fk_film_actor_film` FOREIGN KEY (`film_id`) REFERENCES `sakila`.`film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film_category` ADD CONSTRAINT `fk_film_category_category` FOREIGN KEY (`category_id`) REFERENCES `sakila`.`category` (`category_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`film_category` ADD CONSTRAINT `fk_film_category_film` FOREIGN KEY (`film_id`) REFERENCES `sakila`.`film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`inventory` ADD CONSTRAINT `fk_inventory_film` FOREIGN KEY (`film_id`) REFERENCES `sakila`.`film` (`film_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`inventory` ADD CONSTRAINT `fk_inventory_store` FOREIGN KEY (`store_id`) REFERENCES `sakila`.`store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`payment` ADD CONSTRAINT `fk_payment_customer` FOREIGN KEY (`customer_id`) REFERENCES `sakila`.`customer` (`customer_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`payment` ADD CONSTRAINT `fk_payment_rental` FOREIGN KEY (`rental_id`) REFERENCES `sakila`.`rental` (`rental_id`) ON DELETE SET NULL ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`payment` ADD CONSTRAINT `fk_payment_staff` FOREIGN KEY (`staff_id`) REFERENCES `sakila`.`staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`rental` ADD CONSTRAINT `fk_rental_customer` FOREIGN KEY (`customer_id`) REFERENCES `sakila`.`customer` (`customer_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`rental` ADD CONSTRAINT `fk_rental_inventory` FOREIGN KEY (`inventory_id`) REFERENCES `sakila`.`inventory` (`inventory_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`rental` ADD CONSTRAINT `fk_rental_staff` FOREIGN KEY (`staff_id`) REFERENCES `sakila`.`staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`staff` ADD CONSTRAINT `fk_staff_address` FOREIGN KEY (`address_id`) REFERENCES `sakila`.`address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`staff` ADD CONSTRAINT `fk_staff_store` FOREIGN KEY (`store_id`) REFERENCES `sakila`.`store` (`store_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`store` ADD CONSTRAINT `fk_store_address` FOREIGN KEY (`address_id`) REFERENCES `sakila`.`address` (`address_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
| ALTER TABLE `sakila`.`store` ADD CONSTRAINT `fk_store_staff` FOREIGN KEY (`manager_staff_id`) REFERENCES `sakila`.`staff` (`staff_id`) ON DELETE RESTRICT ON UPDATE CASCADE |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Most of the foreign key constraints use RESTRICT for DELETE (meaning you are not allowed to delete a parent row when children exist), and CASCADE for UPDATE (meaning changes to parent will propagate to children). This is good, since I want to test behavior of both RESTRICT and CASCADE.
OK, we wish to remove these constraints from the slave. To see what we are going to do, consider:
slave1> select drop_statement from common_schema.sql_foreign_keys where table_schema='sakila';
+-----------------------------------------------------------------------------------+
| drop_statement |
+-----------------------------------------------------------------------------------+
| ALTER TABLE `sakila`.`address` DROP FOREIGN KEY `fk_address_city` |
| ALTER TABLE `sakila`.`city` DROP FOREIGN KEY `fk_city_country` |
| ALTER TABLE `sakila`.`customer` DROP FOREIGN KEY `fk_customer_address` |
| ALTER TABLE `sakila`.`customer` DROP FOREIGN KEY `fk_customer_store` |
| ALTER TABLE `sakila`.`film` DROP FOREIGN KEY `fk_film_language` |
| ALTER TABLE `sakila`.`film` DROP FOREIGN KEY `fk_film_language_original` |
| ALTER TABLE `sakila`.`film_actor` DROP FOREIGN KEY `fk_film_actor_actor` |
| ALTER TABLE `sakila`.`film_actor` DROP FOREIGN KEY `fk_film_actor_film` |
| ALTER TABLE `sakila`.`film_category` DROP FOREIGN KEY `fk_film_category_category` |
| ALTER TABLE `sakila`.`film_category` DROP FOREIGN KEY `fk_film_category_film` |
| ALTER TABLE `sakila`.`inventory` DROP FOREIGN KEY `fk_inventory_film` |
| ALTER TABLE `sakila`.`inventory` DROP FOREIGN KEY `fk_inventory_store` |
| ALTER TABLE `sakila`.`payment` DROP FOREIGN KEY `fk_payment_customer` |
| ALTER TABLE `sakila`.`payment` DROP FOREIGN KEY `fk_payment_rental` |
| ALTER TABLE `sakila`.`payment` DROP FOREIGN KEY `fk_payment_staff` |
| ALTER TABLE `sakila`.`rental` DROP FOREIGN KEY `fk_rental_customer` |
| ALTER TABLE `sakila`.`rental` DROP FOREIGN KEY `fk_rental_inventory` |
| ALTER TABLE `sakila`.`rental` DROP FOREIGN KEY `fk_rental_staff` |
| ALTER TABLE `sakila`.`staff` DROP FOREIGN KEY `fk_staff_address` |
| ALTER TABLE `sakila`.`staff` DROP FOREIGN KEY `fk_staff_store` |
| ALTER TABLE `sakila`.`store` DROP FOREIGN KEY `fk_store_address` |
| ALTER TABLE `sakila`.`store` DROP FOREIGN KEY `fk_store_staff` |
+-----------------------------------------------------------------------------------+
To actually make the DROP, we use common_schema's eval():
slave1> call common_schema.eval("select drop_statement from common_schema.sql_foreign_keys where table_schema='sakila'");
eval() is a handy routine which invokes statements generated by the given query.
This concludes the setup part.
Tests will include:
Attempting to delete a parent row
Attempting to add an invalid child row
Attempting to update parent row
I was thinking there would be a difference between the two binary log file formats: STATEMENT and ROW. But the tests I produced showed no difference.
Tests
Attempting to delete parent row:
master> delete from actor where actor_id=1;
ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`sakila`.`film_actor`, CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON UPDATE CASCADE)
slave1> select * from actor where actor_id=1;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update |
+----------+------------+-----------+---------------------+
| 1 | PENELOPE | GUINESS | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
Good: the master refused the DELETE, and no DELETE occurred on slave. Integrity is intact.
Attempting to add an invalid child row:
master> insert into film_actor (actor_id, film_id, last_update) values (9999, 1, NOW());
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`sakila`.`film_actor`, CONSTRAINT `fk_film_actor_actor` FOREIGN KEY (`actor_id`) REFERENCES `actor` (`actor_id`) ON UPDATE CASCADE)
slave> select * from film_actor where actor_id=9999;
Empty set (0.00 sec)
Integrity is still intact.
Attempting to update parent row: there is nothing invalid about this operation. I'm wondering whether changes are CASCADEd on slave as well as on master:
master> update actor set actor_id=999 where actor_id=199;
master> select count(*) from film_actor where actor_id=999;
+----------+
| count(*) |
+----------+
| 15 |
+----------+
The 999 value wasn't there before on the master, so this verifies the CASCADE works on master. As for slave:
slave> select count(*) from actor where actor_id=999;
+----------+
| count(*) |
+----------+
| 1 |
+----------+
slave> select count(*) from film_actor where actor_id=999;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
Bummer! The actor's row was updated, but cascading did not work on slave.
This is actually documented. However, the documentation only relates to the issue of slave tables being MyISAM. The problem occurs even when the slave tables are InnoDB, and have no foreign key constraints.
Conclusion
My personal interest in the scenario is due to something I'm working on, I'll elaborate on a future post. People sometime hope to get rid of foreign keys, and might wonder whether replication performance would boost having constraints removed on slaves.
When slave does not enforce foreign keys, you cannot rely on integrity with cascading constraints. An ugly patch might be to use triggers so as to simulate their behavior. Performance wise this is very bad.
code.openark.org |
5/14/12 8:52 AM
How to run a flawless technical demo
Why demos? For as long as I can remember in my public speaking activities, I have always planned my presentations with some sort of live demo in it. I am always surprised when a conference venue asks me to provide my slides in advance, to be loaded in an anonymous computer with no chance of demos. I always turn down such offers, as I always want to provide a demo.There have been times when technical or time constraints prevented me from demoing something, and in these cases I felt that the presentation was lacking a vital part. But I always try. I have even given demos during lightning talks, and those were the ones that made me feel really good. I have given hundreds of presentations, and hundreds of demos, and as in every human activity, I have made plenty of mistakes. I believe I have learned some valuable lesson from my mistakes, and this article is my attempt at sharing the joy with wannabe presenters and also with presenters who want to embrace this method. So, why having a (risky) live demo in your presentation? Here are some of the reasons that may also appeal to you. Show trust in your product You want to talk about your product. The audience assumes that you have confidence in whichever product you want to talk about. However, if you limit your presentation to showing slides, no matter how beautiful and dynamic they are, the audience will be left with the dormant impression that you were talking about something not really trustworthy, or even (gasp!) unreal. If your audience start thinking that you are dealing with vaporware, nothing can dispel that thought faster and more convincingly than a demo. If you trust your product, then you should demo it. If you don't include a demo because you are afraid that the demo would fail, then don't present the topic. It's as simple as that. Improve the entertainment level of your presentation A demo makes a presentation more lively. The audience sees that you abandon the comfortable protection of your slide deck, where you are totally in control, and you risk your hide with a live demo of something that (as any technological artifact) can fail and blow up on your face, burning your reputation and your ego at the same time. Entertainment in a presentation is very important. As Guy Kawasaki said [1], if you make your presentation full of important things, chances are that people will forget all of them because your presentation will be boring, but if your make your presentation entertaining, then you can easily sneak in some important stuff, and the audience will remember that. increase your reputation as a guru Including live demos in your presentation will make the audience think of you as some sort of guru, and your reputation will grow. Now, I don't do demos for this reason. As I said before, I like presenting technical things with live examples. And then I realized that people attending my presentations had a high opinion of me, because of my demos. If not your reputation, live demos will increase your self confidence, and sometimes they amount to the same thing. Demos DOs and DON’Ts Down to business. A demo is not a casual happening. A successful demo has a long story behind it.When you are on stage with a demo, you are not a boring presenter. You are a magician pulling rabbits from a top hat. You are a gymnast showing your dexterity. You are the center of attention, and success is within your grasp.All this comes at a price, though. Read on. DO: Master the topic First and foremost. You must be really comfortable with the topic being presented. If you aren’t, it shows, and the audience will feel your fear. Therefore, the first requirement for a good demo is that you really understand what you are doing, and why. Not only because you are unlikely to demo successfully something that you don’t fully understand, but also because you will fall at the first question from the audience. (Incidentally, if a presenter maneuvers the presentation to prevent questions, it may be a sign of lack of self confidence, or even downright ignorance.) DO: make a plan of what you want to show Knowing your stuff doesn’t mean that you can convey your enthusiasm for the product to the audience just by showing some random commands. You must decide beforehand what you want to demo, and design a set of steps to follow during the demo. Think of the reasons why you believe your product is wonderful, and try to define these reasons as a set of examples that will make the audience share your feelings. DO: include in the demo your product’s best features When you plan, you have to give the audience the amazing stuff. If your slides claim that your product can make men walk on water, you will have to bring an inflatable pool on stage, fill it with water, bare your feet, and take a stroll in front of everyone. That’s a bit extreme, as we are dealing with software here, and your claims are, hopefully, less daring, although sometimes you may claim features that could be compared to walking on water.Whatever it is that your product has promised, you must show it live. The audience won’t be satisfied by your demo of secondary marvelous features if you don’t show evidence of your primary goods. There are exceptions, of course. If your goods require 30 minutes of processing to show their full potential, you can’t show all of it live. But you may try to give a reduced demo of whatever can be achieved in the time allotted for your presentation. One thing that I often do is start the presentation with a short demo where I get the process started, inform the audience that this process will take 30 minutes, and then get on with the slide show. 30 minutes later, I resume the demo, explain what has happened in the meantime, and finally I show the magic part. This is simple, honest, and very effective. DO: Practice Your experience with the product is not enough to guarantee a good demo. You must make sure that: What you want to demo is actually feasible. If you promise something that your product can’t deliver, there is no amount of penance that can save your reputation;You know how to perform the tasks that you have planned;The tasks happen in a predictable way, so that you know that a given sequence of events will end up with the result that you want.There are no side effects determined by other tasks running in your computer (or computers) that will prevent a positive result. This means that you will repeat the demo several times, until you are satisfied that nothing can surprise you, and everything goes as planned. This phase is very important for you, and also for your product. You are likely to find important bugs when getting ready for a demo. Two birds with a stone! DO: Time it! Your time for a demo is short. No matter how much you want to show your product live, you can’t go beyond the time allotted for the whole presentation. More realistically, your demo will last from 1/4 to 2/3 of the presentation, with 1/3 being the more common duration. Thus, you need to make sure that your demo doesn’t run out of time. Especially if your punch line is at the end of your demo, you won’t be able to show it if the attendees are rushing from your room to attend the next presentation. Have a plan B Despite your preparation, there are things that may happen that will keep you longer than expected at your demo, and you may find yourself short of time. Then you need to have an alternative demo plan, i.e. a shorter demo that you can show from that moment on instead of the original one. What this means is that you need to practice two plans. And maybe three. Such is life! DO: Practice some more - Make sure your demo is visible When you practice, you are looking at your computer and you may think that what you see is the same thing that your audience will see. Don’t make this assumption! When you are on stage, things are much different from what you see at home. In person, in a large ballroom When you are using a projector, or an external screen, you may have a different experience from what you had at home or at the office, with your dedicated 24in screen, where you did prepare a beautiful demo. If the projector has a maximum resolution of 1024x768 (which is quite common nowadays) or even 800x600, you must review your demo, and be ready to scale down your ambitions. What you need to do: find out in advance, days or weeks before the presentation, if possible, what kind of projector you will be dealing with, and try to test with the same resolution.When you are at the venue, test with the projector before the presentation, and make sure that your demo is visible from every seat in the room. Adjust your demo if needed.If there is no advance testing time allocated, grab an apple or a sandwich and do it at breakfast or lunch time. Skipping a meal is less important than risking your reputation. Online, when giving a webinar When you are presenting online, in addition to the resolution of the software delivering your webinar, there is also the possibility of more limitations or complex setups that will stand in the way of a successful demo. You will need to test the webinar software, possibly with two computers: one to deliver the demo, and one to check what another attendee would see. Don’t ever accept a denial along the lines of “we can’t do a dry run, but the software is a piece of cake, nothing can go wrong.” You know that everything can go wrong, so insist and make sure that you get testing time. Cancel the demo if you can’t get it. DON’T: make mistakes This seems an unnecessary recommendation. It goes together with Practice your demo. But we need to stress some points in the matter of mistakes. There are simple mistakes, like misspelling a command when you are typing (I do a lot of SEELCT instead of SELECT), but this kind of mistakes are not the ones that get you in trouble. They may even increase the audience awareness that they are witnessing a live event, The mistakes you must avoid are the ones that make the demo fail; the ones that may show your lack of familiarity with the product (which won’t happen if you have been practicing). Therefore: focus on the task, and you will win. There are, though, mistakes that you can include in your demo. If one of your product’s features is the ability to recover from mistakes, you can include such mistakes in your demo, provided that: You tell the audience beforehand that you are going to make a deliberate mistake, just to show how your product can save your butt. (You may also try the theatrical trick of making the error and then emphatically announce that you did that on purpose. The result really depends on how good your theatrics are.)You include this mistake in your demo plan, and you practice it as thoroughly as you did the rest. DON’T: Run other applications in background during the demo Depending on the product you are showing, there are many ways of spoiling the demo through applications that run when they should not. Let me give you a non-comprehensive list: A Skype balloon saying I miss you honey bunny will not improve your credibility;Twitter and Facebook notifications with more or less embarrassing remarks should be also avoided;Your computer starts a file reindex when you are showing a resource intensive task using three virtual machines, and performance drops to a crawl;The remote server that you are using for your demo goes down for maintenance;A planned backup starts in youd database server right when you need it to be responsive at its best;A daily test starts on your remote server, and removes your demo setup. There are more, and more. If you can think of it, t may happen! DON’T: Deviate from your well rehearsed script Once you have defined a demo plan, stick to it. Make no exceptions. If you must make exceptions, you must plan for them as well. Therefore: make no exceptions. This recommendation closely resembles the next one. DON’T: Make some brilliant improvements at the last minute You are an expert in your field, and an expert of the product that you are presenting. You may also be one of the developers of that project. It is thus very natural and common that you think of improvements that will make your product behave much better. That’s good and commendable. But don’t make these changes on the build that you will use for the demo. NEVER. EVER. I did it. A few times. And I regretted it. Every time. If you make a change, then you must have time to test the whole demo from scratch, more than once, or else you must wait to apply your changes after the demo. Similarly, you may think of an improvement of the demo. If that implies deviating from the plan that you have tested, don’t do it, unless you have time to test the whole demo again with the change. Summing up Doing a live demo is a lot of work, and what you show on stage is only a tiny part of the work involved. But I can assure you that the thrill of having a flawless demo that amazes the audience is deeply fulfilling. I recommend it to all the public speakers. Try it. And then you will be hooked. At my company, we all are. I don’t remember where I read it, as I have read many books and articles by Guy Kawasaki, but I think it was in Reality Check. ↩
datacharmer.blogspot.com |
5/13/12 10:44 PM
SQL Injection at Reddit
Reddit takes SQL injection very seriously.
How seriously?
Check their headers:
scabral-07890:~ scabral$ curl --head www.reddit.com
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Set-Cookie: reddit_first=%7B%22organic_pos%22%3A%201%2C%20%22firsttime%22%3A%20%22first%22%7D; Domain=reddit.com; expires=Thu, 31 Dec 2037 23:59:59 GMT; Path=/
Server: '; DROP TABLE servertypes; --
Date: Sat, 12 May 2012 13:54:20 GMT
Connection: keep-alive
scabral-07890:~ scabral$
A colleague at PICC showed me this when he learned of my talk on MySQL security!
www.sheeri.com |
5/12/12 5:03 PM
OurSQL Episode 90: Handle With Care
This week we present how to use pt-archiver and pt-find, two Percona Toolkit tools. We focus on the common usage of the tools and the gotchas we ran into using them.
News/Events/Feedback
Conferences:
MySQL Innovation Day Schedule Tuesday June 5th, Redwood Shores, CA. Register here (free). Content will be available via live stream, so save the date!
read more
technocation.org |
5/11/12 11:42 PM
Portuguese Planet
A big welcome to the Portuguese MySQL Community. The MySQL team has recognized your support of MySQL, so we hope you can take advantage of the new Portuguese MySQL Planet. Wagner has started us off with his blogs so please feel free to submit your Portuguese Feeds.
sqlhjalp.blogspot.com |
5/11/12 11:05 PM
Workaround for ‘Authentication failed’ Issue When Connecting through HTTP Tunnel
Some of our users have encountered problems with establishing a connection over the tunnel.php script despite the fact that the tunnel.php script is installed correctly. The following error message occurs:
‘Can’t connect to MySQL server on ‘your.sitename.com’ (10061): Authentication failed.’
The problem appears to be with cached proxy servers and will be fixed in one of the next builds of our product.
Currently it can be fixed by replacing the dbforgemysql.exe.config file, that can be found in dbForge Studio for MySQL installation folder, with the attached one.
blogs.devart.com |
5/11/12 5:50 PM
History of MySQL Cluster architecture development
With the release of MySQL Cluster 7.2.5 we've released a version of MySQL Cluster where each data node is capable of using up to 51 threads in total. This is a remarkable feat for a product that only a few years ago was purely single-threaded. This article tries to explain what it is in the MySQL Cluster architecture that makes it so scalable to new HW demands.The MySQL Cluster architecture is based on a design that is used in the world's most sold telecom switch, the AXE switch. This switch is used in all mobile networks delivered by Ericsson. MySQL Cluster also originates from Ericsson. The architecture of the AXE switch was developed in reaction to an older switch that had quality issues since there were too many global variables and too little modularity. The architecture was inspired by how HW was designed. In HW design each module can only communicate with other HW modules using signals (mostly electrical). The idea of the AXE architecture was to use this architecture also for software. The basic concept of this architecture is blocks and signals. Blocks are the software modules (similar to a HW module). Each block is self-contained and contains all the data and code for its needs. There is no shared data with other blocks. So the only method to get access to data in another block is to send a signal to the block requesting the data.When the storage engine of MySQL Cluster, NDB Cluster was designed, the AXE architecture was used, so the NDB DBMS software contains a set of blocks (currently a little more than 20) that communicate with each other using signals.This is the first reason why it is so easy to make MySQL Cluster scale to many CPUs. Since each block only communicates with each other through signals it means that it's trivial to move blocks between different threads unless they are using signals that are immediate and have dependency on returning to the sender block in the same state as when the signal was sent. The NDB blocks have a number of blocks that are not possible to split, these are LQH (Local Query Handler, handles communication between data storage and indexes, handles scans), TUP (data storage), ACC (hash index), TUX (ordered index) and some parts of TSMAN, PGMAN and LGMAN (tablespace, page and log management for disk data). There is however no such dependency between TC and the Local Data Manager blocks (LDM), also the handling of asynchronous replication is in a separate block that can be easily moved to any thread. We also have a number of schema management blocks that have been separated into their own threads using some special lock techniques. Thus we have already from functional division a separation into the LDM domain, the TC domain, the SPJ domain, the schema management domain, the asynchronous replication domain. Currently TC and SPJ are colocated although it's not really necessary, but TC and SPJ are extremely easy to scale since each transaction is independent of the other. Thus we have 4 thread domains and each of these can be placed into separate domains.In MySQL Cluster 6.3 and previous version everything was done in one single thread and this configuration is still supported since it has nice real-time characteristics when everything is placed into one thread. In MySQL Cluster 7.0, the LDM blocks were separated from the rest of the blocks into separate threads. In addition the schema management blocks and the TC and SPJ domain was separated into one domain. Asynchronous replication was also separated into its own thread domain. Finally given that all of these threads requires communication with the outside world we also created a separate receive thread that receives data on TCP sockets and converts the data into prepackaged signals that the blocks can execute and puts them onto a queue to the thread that will execute the signals. We spent considerable effort in making the implementation of communication between threads extremely efficient and this communication is entirely lock-free and uses memory barriers and careful writing and reading to communicate with each other. On x86-machines this part is partly implemented in assembler to make efficient use of optimal assembler instructions.The LDM used yet one more technique to distribute these blocks onto several threads. It makes use of the partitioning that NDB Cluster already supports, also the REDO log in NDB Cluster was also already separated into 4 log parts and thus it was straightforward to create 4 LDM modules where each block is replicated within different threads. Each LDM instance takes care of one part of the data stored in the data node. From the outside the protocol is still the same although the address of a block now also contains thread id in addition to the node id and block id. The block reference already contained space enough for this purpose so no change to the block code was required to handle this.In MySQL Cluster 7.2 we have advanced the threading yet one more step. The first step was to separate the schema management domain from the TC and SPJ domain. Then given that TC can be partitioned to handle different transactions using a simple round robin allocation scheme, we also separated the TC domain into multiple threads. In principle there is no limit to how many TC threads we could use, we pick a number which is appropriate to the CPU requirements of the TC domain. We selected 16 as the maximum number since we've found no scenario where more TC threads are required. We also extended the number of LDM instances to a maximum of 16 by also extending the possible number of log parts to 16.The next step was to create multiple receiver threads. This was relatively easy as well by partitioning the threads to handle different sockets (one socket is used to communicate with each other node in the cluster).Another step we also took was to separate the send call from the block threads. Thus special send threads can be used, this increases latency slightly but improves throughput. It also minimizes the risk of bottlenecks occurring in the data nodes. We set the maximum of send threads to 8 and similarly for receive threads. Normally 3 threads should suffice for even the biggest configuration to some extent dependent on the efficiency of the OS read and write syscalls.Thus what we can see is that the choice of a block and signal architecture have made it possible for us with very small means to bring data node scalability from 1 CPU, onwards to 8 and now onwards to at least 40. Even more can be achieved. Thus MySQL Cluster is based on a SW architecture which can easily accomodate itself to the new HW developed. It has also from the first codeline had the concept of efficient communication in its genes. The first prototype of NDB Cluster developed in 1994 used two SPARC-stations interconnected with Dolphin SCI technology. The current benchmarks we've executed used Infiniband with 56Gb/s interconnects where latency is extremely short when communicating between different machines.The next generation of HW is likely to revolutionize handling of memory. MySQL Cluster is well prepared for this as well.Last a small historical anecdote. In the 1990's the HW vendors had a race to deliver GHz cpus. The x86 CPU sold early 1990 was the 80486 that run at 25MHz. In 1999 the race was over when the first GHz processor was delivered. We released MySQL Cluster 7.2 two months ago running at 17.6MHz. We're soon announcing the next major improvement to this number. So will the 2010's be the decade where the race is on for which DBMS that can first deliver a GHz query execution? MySQL Cluster is well prepared for such a challenge.
mikaelronstrom.blogspot.com |
5/11/12 4:26 PM
MySQL Cluster: mysqld Sort aborted and error 4006
Just a note on a problem that some people may find useful and may work for you if you have the same issue.A client had a problem this morning with queries being aborted with the error message: Got temporary error 4006 'Connect failure - out of connection objects (increase MaxNoOfConcurrentTransactions)' from NDBCLUSTERHere you would think that increasing the number of MaxNoOfConcurrentTransactions would help, but the root cause is something else (the client is not even close of hitting the default MaxNoOfConcurrentTransactions)It turned out that during the night a couple of updates to the schema had been made (ALTER TABLEs to convert a couple of TEXT to VARCHAR, so that a particular index could be created).Looking in the mysql error logs we had:120511 10:47:05 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted: Got temporary error 4006 'Connect failure - out of connection objects (increase MaxNoOfConcurrentTransactions)' from NDBCLUSTER120511 10:47:27 [ERROR] Got error 4006 when reading table './database_name/table_name'I suspected there was something wrong (corrupted?) with the .frm files that the mysql server is caching in the datadir, and simply did, for all mysql servers:stop mysqldremove all .frm for all ndb tables in the databasestart mysqldand finally a "SHOW TABLES" to get all the .frm files back up from the data nodes into the local datadir of the mysqldNow, the problem is gone, but I can't explain how reading a table can cause error 4006...Hope this helps someone.
johanandersson.blogspot.com |
5/11/12 3:55 PM
Upscene Productions is celebrating Ten Years of Database Development Tools
Upscene Productions is celebrating it's 10 year anniversary with a massive discount on all our products: 70% discount until the end of May.
We produce database development, management and testing tools for:
* Oracle
* Microsoft SQL Server
* MySQL
* InterBase
* Firebird
* SQL Anywhere
* NexusDB
* Advantage Database
* Generic connectivity tools for ADO and ODBC
These include test data generator tools, database design and development tools, auditing tools, a dbExpress driver for Firebird, debugging tools, performance analysis tools.
Coupon code TENYEARS will get you this discount, check www.upscene.com for more information.
www.upscene.com |
5/11/12 10:42 AM
SQL Server 2012 Integration Services - Package and Project Parameters
In SQL Server 2012, Microsoft introduced SQL Server Data Tools to accommodate the dynamic nature of SSIS constructs in the form of package and project parameters. This approach lets you combine multi-package projects into a single unit, eliminating the possibility of breaking dependencies between parent and child packages during subsequent deployments. www.databasejournal.com | 5/11/12 10:01 AM Log Buffer #271, A Carnival of the Vanities for DBAs
They say, “April showers bring May flowers.” They basically say that nature brings different things in different colors aimed at improving the things. That is so true for the blogging world too. This Log Buffer Edition also brings out different blog posts to improve things, so enjoy the Log Buffer #271. Oracle: One of world’s [...]
www.pythian.com |
5/11/12 10:00 AM
Newbie: User and Host question
Today on MySql Forums, there was a question in the newbie section about two users — I have a doubt on db host and db user relationship . What does this mean ?
name
host
tom
%
joe
127.0.0.1
New DBAs are often confused by the quirky methods of authentication that MySQL uses. Heck, extremely experienced MySQL DBAs can get confused.
From the manual, 6.2.4. Access Control, Stage 1: Connection Verification
When you attempt to connect to a MySQL server, the server accepts or rejects the connection based on your identity and whether you can verify your identity by supplying the correct password. If not, the server denies access to you completely. Otherwise, the server accepts the connection, and then enters Stage 2 and waits for requests.
Your identity is based on two pieces of information:
The client host from which you connect
Your MySQL user name
So the first user, Tom, is allowed to connect from ‘%’ and ‘%’ is a wildcard for any system. And the second user is only allowed to connect from a host with the address ’127.0.0.1′. Usually systems have a network loop-back (think short circuit) assigned to 127.0.0.1 and uses that for its own traffic internally1.
Also from the same manual page (abbreviated) to provide a little more clarity.
The following table shows how various combinations of Host and User values in the user table apply to incoming connections.
Host Value
User Value
Permissible Connections
‘thomas.loc.gov’
‘fred’
fred, connecting from thomas.loc.gov
‘thomas.loc.gov’
”
Any user, connecting from thomas.loc.gov
‘%’
‘fred’
fred, connecting from any host
‘%’
”
Any user, connecting from any host
‘%.loc.gov’
‘fred’
fred, connecting from any host in the loc.gov domain
It helps to occasionally re-read the The MySQL Access Privilege System of the manual to help remember how users get into the system as well as the edge cases. It is all too easy to set up multiple users with the same user name value but different privileges depending on where they connect.
And thanks to all who take the time to answer questions on the forums!
Hugh simplification used here for brevity.
opensourcedba.wordpress.com |
5/10/12 11:40 PM
Testing Fusion-io ioDrive2 Duo
I was lucky enough to get my hands on new Fusion-io ioDrive2 Duo card. So I decided to run the same series of tests I did for other Flash devices. This is ioDrive2 Duo 2.4TB card and it is visible to OS as two devices (1.2TB each), which can be connected together via software RAID. So I tested in two modes: single drive, and software RAID-0 over two drives.
I should note that to run this card you need to have an external power, by the same reason I mentioned in the previous post: PCIe slot can provide only 25W power, which is not enough for ioDrive2 Duo to provide full performance. I mention this, as it may be challenge for some servers: some models may not have connector for external power, and for some you may need special “power kit”. So you need to make sure you have compatible hardware before getting Duo card. I personally ended up with setup like this: I use a separate power supply.
Fusion-io ioDrive2 Firmware v6.0.0, rev 107004 Public, Fusion-io driver version: 3.1.1.
Now to the results.
For this test I also use Cisco UCS C250 server, and on the graph I show the results for both single card and raid (Duo).
Random writes, async:
We see stable and predictable write performance, with throughput: 660 MiB/s for single, and 1300 MiB/s for Duo
Random reads:
Again both modes provides stable level of throughput. 1350 MiB/s for single and 2300 MiB/s for Duo.
Now with separation per thread for random read synchronous IO:
There is also excellent response time characteristics. 0.25ms and 0.19ms for 8 threads, single and Duo cases.
In general ioDrive2 seems to provide better and more stable performance results comparing to previous generation ioDrive.
Follow @VadimTk
www.mysqlperformanceblog.com |
5/10/12 11:30 PM
Circus Oraclimus Installatus Upgradimus
Circus Oraclimus is back in town. It was last time only four weeks ago and a few months before that as well, and while it the first few times was quite funny to watch the clowns, it does not remain funny to see the same absurd tricks over and over again.
I am referring to this: http://bugs.mysql.com/bug.php?id=56889.
If you have both MySQL 5.5 and 5.6 installed on the same Windows system using the .msi installer you cannot upgrade 5.5. The 5.5 installer refuses to run claiming that a ‘newer version’ [of 5.5.x] is installed. It never was – and still is not – a problem upgrading 5.1. with 5.5 (and 5.6) installed. So this is a bug and it is verified. And it was actually fixed by Vladislav Vaintrub before he left Oracle for Monty Program 1½ years ago. For some reason the fix failed to be committed and never was included with the installers distributed. I did several attempts to wake up the people at Oracle asking them to find that patch and commit it to where it should be. It never did lead anywhere. Not even to a reply.
Needless to say the MariaDB .msi installers for Windows do not suffer from this. You can upgrade any MariaDb version with a higher version installed. But .msi installers for MariaDB are also maintained by Vladislav Vaintrub!
So what you need to do is before upgrading your Oracle/MySQL 5.5 instance if you also have 5.6 installed is to uninstall 5.6, upgrade 5.5 and install 5.6 again (I am not running the configuration wizard as I want to keep the 5.5 and 5.6 configurations I have). A little annoying, but no data or configuration is lost and after all it takes only 10 minutes. So not a big deal. Untill the situation worsended significantly recently (after 5.5.20).
We continue here: http://bugs.mysql.com/bug.php?id=64909
The above uninstall-upgrade-install procedure stopped working for me when upgrading to 5.5.22 and .23. Both my 5.5 and 5.6 instances become non-functional. On 5,5 the `mysql` database lost some tables and on 5.6 the service was removed from the system.
It is 4 weeks ago I upgraded to 5.5.23 (that is what I refer to as last visit from Circus Oraclimus) and now I see 5.5.24 is available on mirrors. Welcome back, Circus Oraclimus! I hope you bring some artists this time and not just the clowns!
( I know perfectly well that that I can install the service very simple for instance with the “mysqld install servicename” -command and don’t need to use the installers. However like most Windows users I like a unified and consolidated view of what is installed on the system – in case in “Control Panel .. Programs” – just like most users of ‘Red Hat type’ Linux prefer to install using the RPM for exactly the same reason)
Tweet
www.webyog.com |
5/10/12 10:20 PM
Installing Apache2 With PHP5 And MySQL Support On Ubuntu 12.04 LTS (LAMP)
Installing Apache2 With PHP5 And MySQL Support On Ubuntu 12.04 LTS (LAMP)
LAMP is short for Linux, Apache, MySQL, PHP.
This tutorial shows how you can install an Apache2 webserver on an
Ubuntu 12.04 LTS server with PHP5 support (mod_php) and MySQL support.
www.howtoforge.com |
5/10/12 7:45 PM
Percona Live Slides and Video Available: The Right Read Optimization is Actually Write Optimization
In April, I got to give a talk at Percona Live, about why The Right Read Optimization is Actually Write Optimization. It was my first industry talk, so I was delighted when someone in the audience said “I feel like I just earned a college credit.”
Box offered to host everyone’s slides from the conference here (mine is here). A big thanks from me to Sheeri Cabral, for recording my talk and posting it online!
The focus of the talk starts with why write optimization is what you want to do in many situations, especially if you need read optimization. Then I get in to some of the theory on optimizing writes by laying out your data better on disk. We approach this gradually, beginning with how B-trees work and progressing with a few simple rules for getting better performance, and see some of the tradeoffs inherent in these techniques.
www.tokutek.com |
5/10/12 5:06 PM
Hardware Components Failures — Survey Results
When preparing for the the IOUG Collaborate 12 deep dive on deploying Oracle Databases for high Availability, I wanted to provide some feedback on what hardware components are failing most frequently and which ones are less frequently. I believe I have reasonably good idea about that but I thought that providing some more objective data [...]
www.pythian.com |
5/10/12 4:14 PM
MySQL training dates by SkySQL Training!
The good news is that we have just released plenty of MariaDB & MySQL training dates covering the summer months and even into October across the globe.
This new training calendar includes all of our current training courses in plenty of locations both in North America and Europe.
You can chose from the following training options:
Administering a MySQL Database
Administering MySQL Cluster
Developing Applications with the MySQL Database
High Availability for the MySQL Database
Performance Tuning for the MySQL Database
You are of course always welcome to request courses in new locations and we will be happy to set these up for you. Onsite courses are also available, wherever you are.
While setting up these new training dates, we have also been working on an overhaul of all our courses with updates to include new versions and in some cases new products. More details about these changes will follow soon in a separate blog entry.
In the meantime, you can check out our new training calendar: http://www.skysql.com/services/training/schedule
www.skysql.com |
5/10/12 2:19 PM
State of PHPCR
It feels like every minute a PHP developer somewhere on this planet starts implementing something aching to a CMS from scratch. Some do it because their project is "so big" it that it "obviously needs" a custom solution. Some do it because their project is "so small" it "obviously needs" just a few days of hacking .. to build a custom solution. Let me briefly focus on the later group. Working in a company with less than 10 people building websites for customers a project needs a bit of a CMS to manage those 10 "semi static" pages seems to be the poster child example of this group. The devs whip up a DB table, slap an ORM in front, maybe even use some generator for the admin UI. Done. Later the clients also wants versioning and luckily many ORMs provide some solution for that. Easy. Permissions? Most frameworks provide some ACL system. Child pages? ORM has some tree algorithm supported. Fulltext search? Integrate ElasticSearch. Custom page types? Uhm well by now you have enough sunken costs that you will make that happen somehow too. Some morning you wake up and you have created the next Drupal or Typo3. If you did, then it would be hard to claim that you did it wrong because both are very successful projects. What PHPCR aims to be is to provide you with a short cut for this path. Or in other words there should be a PHPCR implementation that is so easy to use, with so many helpful higher level components in your favorite framework, that it becomes the natural choice for your next CMS needing project.
Every time anyone talks about PHPCR, they will mention Jackrabbit sooner or later as it is the basis for the currently most mature PHPCR implementation. Here, I just did it too. Jackrabbit requires Java. More importantly it is not trivial and most of you have never heard about it, let alone used it. I think for many high scale use cases its great that PHPCR has been integrated with Jackrabbit, but it needs to be relegated to a side note you mention when someone starts talking about scaling to millions of documents and many GBs of data. Once there is a PHPCR implementation that works with pure PHP, using any RDBMS (including SQLite which is bundled with PHP) it will become easier to just use that instead of whipping up your own tables. No more convincing the sys-admin guy about adding a new daemon to the setup. Instead being able to tell the client what other features they could get in the next code sprint.
So today, we are not there yet. We have an implementation of PHPCR that works on top of Doctrine DBAL to in theory support any RDBMS. Well in reality it already does though the search API only works with MySQL and PostgreSQL. It also doesn't support versioning and ACLs. Oh and if you drop more than 50 documents into it, search performance will start to hurt quickly. Bummer. But there is good news too. The core infrastructure is laid out thanks to Benjamin. Progress on it, while not rocket fast, is continuous thanks to Liip intern Adrien .. as a matter of fact if you go to our demo page it would run all but the admin interface on top of it just fine (yes this includes the cool inline editing!).
I believe that a decently experienced database developer would need a man month to fix up the searching to perform decently unto lets say 1k documents as well as implement simple versioning on top of SQLite. Another 2 man months should enable implementing permissions and support for even more documents, ACLs and a few other goodies. Is this something that a 10 developer company can commit to when offering that simple CMS solution to one of their customers? Obviously not. This takes a few developers who are willing to invest into the future. Until they come along unfortunately PHPCR will continue to not be a viable option for many small CMS projects.
pooteeweet.org |
5/10/12 1:45 PM
Finding Database Network Abusers with Netstat
DBAs often have to relate the consumption of system resources by a database to the available resources on the database server. Sometimes this is a very difficult task. Netstat can help you see network traffic for Oracle processes. Read on to learn how. www.databasejournal.com | 5/9/12 10:01 AM |