Databases News
| Examining the Oracle Database 10053 Trace Event Dump File Ever wonder what the Oracle Database Cost Based Optimizer (CBO) was
doing behind the scenes or how it comes up with an execution plan?
While other tools or settings show us WHAT the CBO comes up with, the
10053 trace event setting tells us HOW the CBO came to its decision (the
final execution plan).
redir.internet.com |
7/30/10 12:21 AM
North Texans -- vote for your new meeting location The North Texas MySQL Users Group is now a special interest group
withing the Dallas Oracle Users Group. As such, we can meet in oracle's
offices in Plano or Irving. In the past there has been demand for
meetings in the northern part of the Metroplex and demand for meetings
more near the DFW Airport. So we can meet in either office or alternate
between the two. Please state you preference by voting on the North
Texas MySQL Users Group website.
dave-stokes.blogspot.com |
7/29/10 10:04 PM
Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning) At Kscope this year, I attended a half day in-depth session entitled
Data Warehousing Performance Best Practices, given by Maria Colgan of
Oracle. My impression, which was confirmed by folks in the Oracle
world, is that she knows her way around the Oracle optimizer. See part 1
for the introduction and talking about power and hardware. This part
will go over the 2nd “P”, partitioning. Learning about
Oracle’s partitioning has gotten me more interested in how
MySQL’s partitioning works, and I do hope that MySQL partitioning
will develop to the level that Oracle partitioning does, because
Oracle’s partitioning looks very nice (then again, that’s
why it costs so much I guess). Partition – Larger tables or fact
tables can benefit from partitioning because it makes data load easier
and can increase join performance and use data elimination. Parallel
execution can be done with partitioning due to partition pruning. The
degree of parallelism should be a power of 2, because of hash-based
algorithm in hash partitioning. To translate this to the MySQL world,
if you are using LINEAR HASH partitioning, then you should use a degree
of parallelism that is a power of 2 (I checked, and indeed. Otherwise,
use a degree of parallelism that makes sense given the number of
partitions you have. One important note that during Pythian’s
testing of MySQL partitioning, we found that all partitions were locked
when an INSERT occurs, for the duration of the INSERT. Bulk-loading
with MySQL partitioning is not as fast as it would be if MySQL allowed
partition pruning for INSERTs. So, what should be partitioned? For the
first level of partitioning, the goal is to enable partitioning pruning
and simplify data management. The most typical partitioning is range or
interval partitioning on a date column. Interval partitioning is you
say what the partition is (date, month) and partition is automatically
created. MySQL does not have interval partitioning, and I have seen
typical first-level partitioning be range or list based on a date or
timestamp column. Note that if you use a timestamp field, the
partitioning expression is optimized if you use TO_DAYS(timestamp_field)
or YEAR(timestamp_field). In my experience, using anything else (such
as DATE(timestamp_field)) actually makes partitioning slower than not
using partitioning at all. Note that this is based on tests I did a few
months ago, and your mileage may vary. So — how do you decide
partitioning strategy? Ask yourself: What range of data do the queries
touch – a quarter, a year? What is the data loading frequency? Is
an incremental load required? How much data is involved, a day, a week,
a month? The answers to the above questions will tell you about how big
your interval needs to be. The best scenario is that all answers are
the same, “we load every day, and people query by day.” If
the answers are different weight access a higher priority than loading,
because most people care more about query performance than performance
of ETL. This is true even if your intervals have different sizes —
ie sales per day are much bigger in Dec but that’s OK. However,
Maria recommends that the subpartition be as evenly divided as possible.
Easier to look at more partitions than to look at a partition
that’s too big. But you don’t want too many partitions, max
Oracle allows partitions is 1 million partitions, prior to 11g it was
64,000. “Stick closer to 64,000 than 1 million”.
MySQL’s limitation is 1024 per table. For the second level of
partitioning, also called subpartitioning, the goal is to allow for
multi-level pruning and improve join performance. In Oracle, the most
typical subpartition is hash or list – in MySQL, you can only
subpartition by hash or key. How do you decide subpartitioning strategy?
Select the dimension queried most frequently on the fact table OR Pick
the common join column For example, if you want to look at sales per
day, per store, you would choose “per day” as the partition
and “per store” as the subpartition. If you do not have a
good partition on logical elements (like grouping), then you can
subpartition using hash partitioning on common joins — perhaps
surrogate keys, or using join key of the largest table involved in the
join. For example, if the sales table is partitioned and another big
table is product, you can hash subpartition product_id. Because
there’s overhead in partitions (loading metadata, reading
metadata), make sure size of partitions and subpartitions is >20 Mb.
So better to have a 30 Mb subpartition than a 15 Mb subpartition. [I
have no idea if this is true in MySQL or not -- I think the general
concept is true, because there is some overhead, but I have no idea
about the 20 Mb figure and why that's true for Oracle, nor do I know
what is true in MySQL.] One easy calculation is double the # of CPUs,
round up to nearest power of 2. If you’re executing in parallel,
Oracle will use 2x CPUs. (all this advice, by the way, follows 80/20
rule, this is probably good for about 80% of the environments out
there). Of course, MySQL does not do parallel execution very well, so
this probably does not apply. Oracle knows it can get partition
elimination while it does a join. If 2 tables have the same degree of
parallelism (same # of buckets) and are partitioned in the same way on
the join column (say, customer_id in a subpartition of sales and a
partition of customer), Oracle will match the partitions when joining:
sales table joined with customer table can change into 4 small joins:
sales sub part 1 joins with customer part 1 sales sub part 2 joins with
customer part 2 sales sub part 3 joins with customer part 3 sales sub
part 4 joins with customer part 4 And with parallelism, the total time
is now reduced to the time it takes to do one of those smaller joins.
This is also why you want to have a power of 2 for buckets –
because cores/processors come in powers of 2. Partition-wise joins like
this can also be done with range or list, assuming both tables in the
join have the same buckets. I have no idea if MySQL partitioning works
this way, but it’s certainly a functionality that makes sense to me.
www.pythian.com |
7/29/10 10:00 PM
Data Warehousing Best Practices: Comparing Oracle to MySQL, part 1 (introduction and power) At Kscope this year, I attended a half day in-depth session entitled
Data Warehousing Performance Best Practices, given by Maria Colgan of
Oracle. My impression, which was confirmed by folks in the Oracle
world, is that she knows her way around the Oracle optimizer. These are
my notes from the session, which include comparisons of how Oracle works
(which Maria gave) and how MySQL works (which I researched to figure out
the difference, which is why this blog post took a month after the
conference to write). Note that I am not an expert on data warehousing
in either Oracle or MySQL, so these are more concepts to think about
than hard-and-fast advice. In some places, I still have questions, and
I am happy to have folks comment and contribute what they know. One
interesting point brought up: Maria quoted someone (she said the name
but I did not grab it) from Forrester saying, “3NF is typically a
selfless model used by Enterprise data warehouse, which is used by the
whole company. A star schema is a selfish model, used by a department,
because it’s already got aggregation in it.” I thought that
was an interesting way of pointing that out — most people do not
understand why 3NF is not good enough for data warehousing, and I have
had a hard time explaining why a star or snowflake schema should be
used. Another schema-related topic I had a hard time putting into words
before this workshop was the difference between a star and a snowflake
schema: compared to a star schema, in a snowflake schema, you have more
than one fact table and maybe some dimensions that are not used often.
From Maria and the slides: “Oracle says model what will suit your
business best. Don’t get lost in academia. Most schemas are not
100% according to the theoretical models. Some examples: 3NF schema
with denormalized attributes to avoid costly joins, Star schema with
multiple hierarchies in same fact table.” Data warehousing has a
3-step approach — 1) data sources -> staging layer (temp
loading layer) 2) staging layer (temp loading layer)-> foundation
(logical, data store) layer 3) foundation (logical, data store) layer
-> access and performance layer The foundation layer is usually 3NF
the access layer is usually a star or snowflake schema. As for the data
sources, they can be varied, you would hope that they are in 3NF (and if
they are you can skip the first 2 steps) but they are not always that
way. The 3 P’s of best practice for data warehousing (on Oracle)
are power, partitioning, parallelism. The goal of the data warehousing
environment is to minimize the amount of data accessed and use the most
efficient joins – so it is not so index focused. This may be
based on Oracle’s way of doing joins, I am not so sure if it
applies to MySQL as well. Power The weakest link in the chain (the 3
steps above) will define the throughput, so make sure your hardware
configuration is balanced. Maria mentioned that as DBAs, “most of
the time we don’t have control over this, but we’re still
bound to the SLAs.” This includes hardware that immediately comes
to mind such as # of CPUs/cores, speed of CPU, amount of RAM, speed of
disk as well as what we may not think of immediately: speed of network
switches, speed of disk controllers, number and speed of host BUS
adapters. Notes on host BUS adapters (HBAs): Know the # of HBA ports
you have. 4 Gb HBA does 400 Mb/sec. 2 Gb HBA does 200 Mb/sec. Make
sure there’s enough HBA capacity to sustain the CPU throughput
(ie, make sure HBA isn’t the bottleneck). Also the speed at which
it all talks. If you have a 4 Gb machine but a 2 Gb switch, you end up
having 2 Gb throughput. Upgrade the network at the same time you
upgrade machines. Because we are talking about data warehousing, it is
often not possible to eliminate disk I/O, so the goal is to have the
fastest I/O throughput possible. Data warehouses need to be sized on
I/O throughput not number of I/O’s. I made a post earlier about
how to determine I/O throughput for a system, which used information
from this session. Justin Swanhart already pointed out that this is
based on the fact that Oracle can do hash joins and MySQL can only do
nested loop joins. I wonder, though, if there is indeed no case when
using MySQL for which I/O throughput is a more useful metric than iops.
Disk arrays that are expensive are usually sized for iops, not
throughput, and because they’re expensive the disk array is shared
throughout the company. A DBA needs to ask ‘how many connections
into the storage array do I have? How many disk controllers do I have?
Where are my physical disks, and which controllers are they hanging off
of?’ Typical 15k rpm disk can do about 25-35 Mb/sec (per disk)
random i/o’s. Disk manufacturers will throw out numbers like
200-300 Mb/sec but that’s sequential I/O and leading edge of the
drive. Make sure all your LUNs are not coming off the same set of
disks, so that you’re not conflicting on disk seeks. Continue to
part 2, partitioning.
www.pythian.com |
7/29/10 9:53 PM
Databases, The End User Experience Does it matter if the end user knows what the database is?Recently I got
a wonderful view of a database from the end user perspective.While I was
traveling I had found a restaurant where I had decided to let friends
who live locally know where I was at. Part way through my food I got a
message from a local friend that said "Don't eat there, their food
always makes people sick!""Always" is a word that I would
think would be a little too strong when applied to a restaurant,
right?Nope, the next day I got to feel the full truth of the word.A
couple of days later I am telling some friends about this and a local
asked me "Where was this, I want to avoid them." I didn't get
asked this question once, I got it asked a dozen times.I don't know
where the place is. Why is that? Because the system I was using lost the
entire day worth of my data. I don't know how often they loose data, but
from asking a few other folks it appeared to be that it is more frequent
then not.It came up in casual conversation the other day that the site
had moved off Postgres to another system recently. Which suddenly made
everything make sense, because the particular solution they moved too is
not very durable.We talk about databases being "transactional"
or not. We talk about them being "durable". What matters in
the end, to me as an end user, is that when I put my data in a system, I
want a confirmation that the system stored it. I don't want to retype my
data, and I don't want to collect it again. If I was the operator for
the site? I certainly wouldn't want to be losing my users data.In the
MySQL world? MyISAM is the most abandoned storage engine in the stack.
People will pick it initially because it is fast, but the first time
they discover data corruption or have to deal with multiple hours of
recovery time they quickly move away from it.As an operator I wouldn't
want to be having to explain to my users or my boss, that we had to wait
12 hours until the database recovered itself (or that it had corrupted
itself). "Transactional" systems know how to handle recovery.
People will wave their arms about and talk about disk controllers, disk
failures, etc... That is hand waving. A properly configured system is
redundant and sure it can be hit by lighting, but the real issue is most
likely going to be that a plug gets pulled or a program crashes.I look
at, and even work with, some of the "no-sql" solutions. Some
of them I recommend, and other's of them I don't. I look at scale out
needs, usage patterns, and a wide variety of other details.As end user
though?I would like to know that my data was stored, and that I will
reliably be able to retrieve it when I want. I don't like outages. Of
the services online that I pay for or that I have integrated into my
life? I can't imagine wanting to deal with a system which was
unreliable. A free service which does not work most of the time, is not
free. It will consume my time whenever it is not available.There is an
end user experience for the database, site operators ought to remember this.
krow.livejournal.com |
7/29/10 6:11 PM
Determining I/O throughput for a system At Kscope this year, I attended a half day in-depth session entitled
Data Warehousing Performance Best Practices, given by Maria Colgan of
Oracle. In that session, there was a section on how to determine I/O
throughput for a system, because in data warehousing I/O per second
(iops) is less important than I/O throughput (how much actual data goes
through, not just how many reads/writes). The section contained an
Oracle-specific in-database tool, and a standalone tool that can be used
on many operating systems, regardless of whether or not a database
exists: If Oracle is installed, run DBMS_RESOURCE_MANAGER.CALIBRATE_IO:
SET SERVEROUTPUT ON DECLARE lat INTEGER; iops INTEGER; mbps INTEGER;
BEGIN -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO(<DISKS>,
<MAX_LATENCY>,iops,mbps,lat); DBMS_RESOURCE_MANAGER.CALIBRATE_IO
(2, 10, iops, mbps, lat); dbms_output.put_line('max_mbps = ' || mbps);
end; For us MySQL folks, or even the Drizzle or NoSQL folks, Oracle
offers a free standalone tool called Orion. The example given in the
slides was: ./orion –run advanced –testname mytest
–num_small 0 –size_large 1024 –type rand
–simulate contact –write 0 –duration 60 –matrix
column -num_small is 0 because you don’t usually do small
transactions in a dw. -type rand for random I/O’s because data
warehouse queries usually don’t do sequential reads -write 0
– no writes, because you do not write often to the dw, that is
what the ETL is for. -duration is in seconds -matrix column shows you
how much you can sustain I would be interested to see how other folks
measure I/O throughput, and maybe even do a side-by-side comparison of
different tools. Orion is available for: Linux (x86, x86-64, Itanium,
Power) Solaris (SPARC64) AIX (PPC64) zLinux HPUX (PA RISC, Itanium)
Windows I am working on a larger write-up of the session itself, which
had many concise descriptions of data warehousing issues, but I thought
that this merited its own post.
www.pythian.com |
7/29/10 4:23 PM
Shinguz's Blog (en): MySQL Cluster Local Checkpoint (LCP) and Global Checkpoint (GCP) MySQL Cluster is mainly an in-memory database. Nevertheless it requires
a good I/O system for writing various different information to disk. The
information MySQL Cluster writes to disk are the: Global Checkpoints
(GCP) which are the transactions. Local Checkpoints (LCP) which is a
dirty image of the data. Backup. In the following schema you can see
what is related to each other: Please find here the meaning of each
parameter: TimeBetweenLocalCheckpoints DataMemory
DiskCheckpointSpeed DiskCheckpointSpeedInRestart CompressedLCP Odirect
DiskSyncSize RedoBuffer TimeBetweenGlobalCheckpoints
NoOfFragmentLogFiles FragmentLogFileSize InitFragmentLogFiles
www.fromdual.com |
7/29/10 3:29 PM
Fast Company is searching for 2010's Most Influential Person Online. I admit this is not closely related to my day-by-day activities, but I
find the experiment interesting. MySQL is the most popular open source
database, particularly in the Web. Both twitter and facebook are
avid users of MySQL. MySQL has influenced plenty of projects, has
contributed to the creation of a multiplicity of social networks and
it's behind many intriguing entrepreneurial ideas. The question is: how
influential are you? Fast Company started a project to measure how much
we influence the online communities. Here is the description coming from
their website. We started with a simple question: Who are the most
influential people online right now? That's what The Influence Project
is designed to answer. By participating, you will have your picture
appear in the November issue of Fast Company magazine as part of an
amazing photo spread. The more influence you demonstrate, the bigger
your picture will be. You may discover that you're more influential than
you think. Influence is not only about having the most friends or
followers. Real influence is about being able to affect the behavior of
those you interact with, to get others in your social network to act on
a suggestion or recommendation. When you post a link or recommend a
site, how many people actually bother to check it out? And what's the
likelihood of those people then forwarding it on? How far does your
influence spread? This is the type of influence we're looking for. We
want to find the most influential person online. Who knows? It might
even be you. Participate to the project by clicking here.
feedproxy.google.com |
7/29/10 12:57 PM
Building a Data Warehouse Blueprint for Success One of the most integral components and critical success factors of any
enterprise data warehousing initiative is the Solutions Architecture
document, a high-level conceptual model of a data warehousing solution.
Learn why this collaborative effort that addresses the needs of all
major stakeholders, including both the business units and Information
Technology (IT), is essential.
redir.internet.com |
7/29/10 12:30 AM
I/O Performance Tuning Tools for Oracle Database 11gR2 Oracle Database 11g Release 2 (11gR2) makes it easier than ever to
isolate and identify potential root causes of poor I/O performance. This
article focuses on the various Oracle and non-Oracle tools to generate
sample workloads to provide sufficient metrics for detection and
eventual resolution of performance bottlenecks within the database's
underlying input/output (I/O) subsystem.
redir.internet.com |
7/29/10 12:15 AM
Will Oracle kill MySQL? I get asked this question often. It was mentioned again recently in a
NYTECH executive breakfast with RedHat CIO Lee Congdon. The short answer
is No. There is clear evidence that in the short to medium term Oracle
will continue to promote and enhance MySQL. Some of these indicators
include: EU 10 point commitment in December 2009 – See Oracle
Makes Commitments to Customers, Developers and Users of MySQL MySQL
Conference April 2010 – Opening keynote by Edward Screven State of
the Dolphin Oracle Magazine Jul/Aug 2010 – Interview with Edward
Screven Open for Business. It is clear from these sources that Oracle
intends to incorporate MySQL into Oracle Backup and Security Vault
products. Both a practical and necessary step. There is also a clear
mention of focusing on the Microsoft platform, a clear indicator that
SQL Server is in their sights without actually saying it. What is
unknown is exact how and when features will be implemented. Also
important is how much these may cost the end user. Oracle is in the
business of selling, now an entire H/W and S/W stack. They also have a
complicated pricing model of different components with product
offerings. I assume this will continue. There are already two
indications, InnoDBbackup included for Enterprise Backup (from April
Keynote) and 5.1 enterprise split. (Note: while this split may have
existed prior to Oracle, it is now more clearly obvious). MySQL can
never be seen as drawing away from any Oracle sales of the core entry
level database product. It is likely Oracle will provide a SQL Syntax
compatibility layer for MySQL within 2 years, however it will I’m
sure be a commercial add-on. Likewise, I would suspect a PL/SQL lite
layer within 5 years, but again at a significant cost to offset the
potential loss of sales in the low end of the server market. There
continues to be active development in the MySQL Enterprise Monitor,
MySQL Workbench and MySQL Connectors which is all excellent news for
users. Moving forward, how long will this ancillary development of free
tools continue? What will happen to the commercial storage engine, OEM
and licensing model after the 5 year commitment? How will the MySQL
ecosystem survive.? There is active development in Percona, MariaDB and
Drizzle forks, however unless all players that want to provide a close
MySQL compatible solution work together, progress will continue to be a
disappointing disjointed approach. The 2011 conference season will
also see a clear line with competing MySQL conferences in April
scheduled at the same time, the O’Reilly MySQL conference in Santa
Clara California and the Oracle supported(*) Collaborate 2011 in
Orlando, Florida. I have a number of predictions on what Oracle ME MySQL
may look like in 5 years however this is a topic for a personal discussion.
ronaldbradford.com |
7/28/10 10:32 PM
Speaking at Surge Scalability 2010 – Baltimore, MD I will be joining a great list of quality speakers including John
Allspaw, Theo Schlossnagle, Rasmus Lerdorf and Tom Cook at Surge 2010 in
Baltimore, Maryland on Thu 30 Sep, and Fri Oct 1st 2010. My
presentation on “The most common MySQL scalability mistakes, and
how to avoid them.” will include discussing various experiences
observed in the field as a MySQL Consultant and MySQL Performance Tuning
expert. Abstract: The most common mistakes are easy to avoid however
many startups continue to fall prey, with the impact including large
re-design costs, delays in new feature releases, lower staff
productivity and less then ideal ROI. All growing and successful sites
need to achieve higher Availability, seamless Scalability and proven
Resilience. Know the right MySQL environments to provide a suitable
architecture and application design to support these essential needs.
Overview: Some details of the presentation would include: The different
types of accessible data (e.g. R/W, R, none) What limits MySQL
availability (e.g software upgrades, blocking statements, locking etc)
The three components of scalability – Read Scalability/Write
Scalability/Caching Design practices for increasing scalability and not
physical resources Disaster is inevitable. Having a tested and
functional failover strategy When other products are better (e.g.
Static files, Session management via Key/Value store) What a lack of
accurate monitoring causes What a lack of breakability testing causes
What does “No Downtime” mean to your organization.
Implementing a successful “failed whale” approach with
preemptive analysis Identifying when MySQL is not your bottleneck
ronaldbradford.com |
7/28/10 9:53 PM
db4free.net’s 5th birthday To my shame I must admit, I missed it. It happened on June 29, 2005 when
db4free.net was first available to the public. At that time it was
running MySQL 5.0.7 beta. Quite a lot has happened since then, MySQL 5.0
made its way up to 5.0.91 and the current MySQL GA version is 5.1.49,
which is also the version db4free.net is running as of today. The first
phpMyAdmin version that db4free.net was offering to provide easy access
to the user’s databases was 2.6.3. Today I updated phpMyAdmin to
3.3.5. Statistics are not necessarily 100 % accurate, but here is the
best I can come up with. Since its launch, db4free.net had 528,900
visits. The ratio registrations per visits is at about 22 %, so more
than every fifth visit ends in signing up for a new database. Which
means, that about between 110,000 and 120,000 database accounts (meaning
database and user) have ever been created. There have been some cleanups
since then to make resources available to people who actively use their
databases. Which is why the current number of databases is much lower,
at slightly above 13,400. Today I must (also) admit that the code
behind db4free.net’s web application was initially quite poor and
it stayed poor for quite long. Why it was poor from my today’s
point of view is probably due to my learning process in these 5 years.
This year I did a lot of cleanup on the code behind db4free.net’s
web application and removed a few bottlenecks which often made the
website painfully slow. Now it’s in a fairly reasonable state
again, and I have some ideas in the back of my head how to further
improve it. So db4free.net is far from its end of life. Quite the
opposite is the case, it’s time to give it a new boost so that the
balance after its 10th anniversary is even more impressive than
today’s 5 year balance.
www.mpopp.net |
7/28/10 8:34 PM
Early results from admission_control I want to limit the maximum number of concurrent queries by MySQL
account. This will allow some accounts to have many concurrent
connections without the risk of having many concurrent queries. The
feature is enabled by adding admission_control to my.cnf. Early results
for it are promising but I have much more testing to do. Results for
sysbench doing fetch by primary key with 1, 2, 4, 8, 16, 32, 64, 128,
256, 512, 1024, 2048 and 4096 concurrent clients are in this graph. The
results for admission control have the label new. Both new and original
use MySQL 5.1.47 with the Facebook patch. MySQL with admission control
is much better at sustaining peak QPS when the server is saturated. I
have a bit of work to make this ready for production. The implementation
requires a count of running queries per account. This count must be
adjusted when a thread blocks within the server. Things that can make a
thread block include LOCK TABLES, GET_LOCK() and InnoDB row lock waits.
I need to find all places in the server where a thread might block. When
this feature is enabled, this output from SHOW PROCESSLIST displays the
result of a query blocked by admission control: | 1182 | root |
db1:56546 | test | Query | 0 | NULL |
HANDLER sbtest READ `PRIMARY` = (3331584) | | 1188 | root | db1:56552 |
test | Sleep | 0 | wait for max concurrent queries | NULL |
www.facebook.com |
7/28/10 7:20 PM
Install MySQL from Tar ball Download your favourite version of MySQL Server from your local mirror.
We can use the latest GA (5.1.49) or development release (5.5.5m3). Move
your tarball to the /usr/local directory. I'm assuming that you're
logged into your box using the root account. If you're not then you
might need to utilize sudo.Add the mysql user and group
groupadd mysql useradd -g mysql mysqlUntar and Create a
symbolic link cd /usr/local/ tar -zxvf
mysql-{version}.tar.gz ln -s mysql-{version} mysqlFile
and Directory creation/permissions mkdir -p
/var/lib/mysql * chown -R mysql:mysql
/usr/local/mysql chown -R mysql:mysql /var/lib/mysqlInstall
the system databases and place my.cnf cp
/usr/local/mysql/support-files/my-medium.cnf /etc/my.cnf
** scripts/mysql_install_db --user=mysql
--datadir=/var/lib/mysql --basedir=/usr/local/mysql ***Start
mysqld /usr/local/mysql/bin/mysqld_safe &Check your
error log file to keep abreast of any issues that might arise.Some house
keeping is needed to ensure that MySQL will start a boot and also adding
mysql into the path. These are basic Linux tasks. Please let me know if
you'd like to see my recommendations for these tasks.* This command is
going to create your datadir location. This might not fit your
preferences so this is something that you will choose to include or
omit. You should review your my.cnf and make any changes
to accommodate your desired location.** This will copy a default
my.cnf to the preferred my.cnf location. If you have written your own or
have something you wish to use, copy that and not the example file. Your
my.cnf is a pretty big deal as it's going configure your server to run
like a two-legged dog or Usain Bolt!*** These parameters can be
augmented, omitted or kept. You're going to want to use the values in
your my.cnf or change your my.cnf to reflect what you use here.
blog.mysqlboy.com |
7/28/10 2:00 PM
Hybrid licensing strategies for open source monetization One of the issues that has arisen from the ongoing debate about the open
core licensing strategy is the continuing confusion about open core
compared to the use of open source components in a larger proprietary
product – such as IBM’s use of Apache within WebSphere. To
some people there is no difference between the two (since they both
result in products that make use of open source but are not open
source), however it is clear to me that while the end result might be
the same these are very different strategies that involve different
approaches to engaging with open source communities/projects. While open
core has a clear definition there is no agreed term or definition for
the latter category. Over the years we have used a variety of terms to
describe it, including “open and closed”, “embedded
open source”, “open inside” and “open
complement”, while Jack Repenning has referred to it as
“open infrastructure”. Our next categorization of open
source-related business strategies is still a work in progress but the
current thinking is as follows: There are a variety of complementary
strategies employed by vendors to generate revenue from open source
software indirectly. The simplest of these is open complement which
is selling other products and services that are related to but separate
from, and not reliant upon, the open source project. Then there is
encouraging open source development on top of proprietary products to
retain develop interest in that product. This is known as open edge.
Then there is using open source software to create a platform for the
provision of SaaS or cloud or social networking services (for example),
which I am referring to as open platform. Then there is using open
source components as building blocks for a larger proprietary software
product, which I am calling an open foundation licensing strategy.
(This categorization is a work in progress, we welcome and encourage any
feedback) Open core and open foundation have different evolutionary
lineages: open core is a variation on dual licensing as practiced by the
likes of MySQL and Sleepycat that also borrows heavily on the
value-added subscription model as practiced by Red Hat and JBoss.
Meanwhile open foundation has its roots in the commercialization of BSD,
which pre-dates the concepts of open source and free software, as well
as Apache. From a practical perspective, the easiest way to think of the
distinction between open core and open foundation is via an example:
PostgreSQL is an independent, community-developed open source project.
EnterpriseDB offers extensions to the PostgreSQL core, such as
Oracle-compatibility, in the form of Postgres Plus Advanced Server.
PostgreSQL has also been used by many other vendors to create commercial
products. For example Greenplum used PostgreSQL as the foundation of its
Greenplum Database (for other examples see this post). This allowed the
company to build on proven database technology and avoid reinventing the
wheel, but it also involved the creation of an entirely new product,
rather than extensions to an open source project (the company initially
actually started a new project, Bizgres, and created extensions to that
but Bizgres was last seen in August 2008). So while open core involves
offering proprietary extensions targeted at a segment of the open source
project user base, open foundation involves using open source software
to create entirely new products, targeted at a different user base. The
example used above highlights three important points to consider when
comparing open core and open foundation strategies: 1/ While open core
is most readily associated with vendor-controlled projects it can also
be used as a strategy to monetize community-controlled projects. 2/
Open core strategies can be used in conjunction with complementary
strategies. In the Greenplum example the company’s relationship
with Bizgres was open core, while the relationship with PostgreSQL was
open foundation. Similarly there is an open core relationship between
Actuate’s BIRT products and the Eclipse BIRT project, and an open
complement relationship between Actuate 10 and the Eclipse BIRT project.
Meanwhile there is an open core relationship between Day
Software’s CRX content repository and the Apache Jackrabbit and
Sling projects, and a open foundation relationship between CQ5 and
Jackrabbit, Felix and Sling – as well as the numerous other Apache
projects that Day contributes to. 3/ Open core and open foundation are
licensing strategies used as part of a larger business strategy for
engaging with and commercializing open source software, which highlights
the futility in trying to pigeon-hole companies as “open core
vendors” or “open source vendors”. Finally it is worth
thinking about the different tensions that the open core and open
foundation strategies create with their respective communities. As Jorg
Janke notes, “looking for an income stream as an open source
vendor always results in some sort of conflict with the community. So,
you have to pick the community you want to ‘offend’.”
With a vendor-controlled open core strategy the community is a user
community, and as we have previously discussed the conflict is in
deciding what features belong in the core and what features don’t.
With an open foundation strategy the community is the open source
project developer community, and the conflict lies in deciding what
features and resources to contribute to that project. A
community-controlled open core strategy arguable results in conflict
with both the user and developer communities, although since the vendor
does not own or control the project the relationship is much more
comparable to the open foundation strategy. We will be writing more
about other strategies for generating revenue from open source software,
in a follow-up to our Open Source is Not a Business Model report, which
is due to be published latter this year. It will provide more context
for the economic motivators and issues involved in the various models,
as well as updated research on which vendors are following which
strategies, and why, as well as a survey to uncover what software users
make of it all. The report will be freely available to CAOS subscribers.
For more details of the CAOS research practice, and to apply for trial
access, click here.
blogs.the451group.com |
7/28/10 11:58 AM
Drizzle, boost::unordered_multimap++ =========================================================================================================
REGRESSION REPORT
=========================================================================================================
MACHINE: drizzle-build-n01.wc1.dfw1.stabletransit.com RUN ID: 324 RUN
DATE: 2010-07-27T21:48:07.932094 WORKLOAD: innodb_1000K_readonly SERVER:
drizzled VERSION: lp:drizzle/staging REVISION: 1669 COMMENT: 1669:
Brian Aker 2010-07-27 This patch turns the table_cache into
boost::unordered_multimap.
=========================================================================================================
TRENDING OVER LAST 5 runs Conc TPS % Diff from Avg Diff
Norm? Min Max Avg STD
=========================================================================================================
16 1993.10 +1.19% 23.37 within norms 1905.44
2010.28 1969.73 26.56 32 2676.28 +1.75%
46.11 outside norms 2568.20 2685.05 2630.17 33.22 64
2622.82 -2.09% -56.00 outside norms 2613.02 2737.89
2678.82 43.09 128 2621.23 -0.61% -16.15 within
norms 2609.74 2662.30 2637.38 16.41 256 2598.18
+5.50% 135.46 outside norms 2406.50 2599.34 2462.72
68.67 512 2343.83 +52.30% 804.91 outside norms
1318.38 2347.49 1538.92 402.80 1024 1722.08
+104.66% 880.65 outside norms 614.65 1725.52 841.43
440.35 Boost::unordered_multimap versus the hand crafted MySQL
HASH?Boost wins by a long shot. The above is from sysbench. We run it on
each and every push that goes into the main tree looking for regression.
Every so often we get to see the opposite happen :)Thanks to this patch,
"Table" now joins the ranks of what we call a "trusted
object". This means we can start safely assuming that the
destructor on it is working (most of the MySQL codebase was crafted in
such a way that you can only use a very limited subset of C++). In all
of the new code in Drizzle we can easily make use of C++. There are
still a few older bits where we cannot. Having "Table" now
work means we can safely work in a number of new areas in the server.
One of the biggest changes that will be coming soon is the removal of
LOCK_open for a number of new cases.
krow.livejournal.com |
7/28/10 6:47 AM
Five Handy Tips for MySQL's Powerful UPDATE Statement Rob Gravelle explores the MySQL UPDATE statement, which is used to
modify existing records in a table. Among its many features, he looks
at how to update multiple tables, avoiding mistakes, and how to limit
how many rows are updated.
redir.internet.com |
7/28/10 12:25 AM
If you missed MySQL Idiosyncrasies that BITE I recently gave a webinar to the LAOUC and NZOUG user groups on MySQL
Idiosyncrasies that BITE. For the benefit of many viewers that do not
use English as a first language my slides include some additional
information from my ODTUG Kaleidoscope presentation in June. Thanks to
Francisco Munoz Alvarez for organizing. MySQL Idiosyncrasies That Bite
2010.07 View more presentations from Ronald Bradford.
ronaldbradford.com |
7/28/10 12:24 AM
N900 – control all of your accounts with this script If you own a Nokia N900 cellular device you might be interested in the
ability to control all of your IM accounts from the command line. For
those that do not know, the N900 runs Maemo Linux and is capable of
running MySQL embedded if you so choose. Here’s a quick script I
wrote to provide that functionality for IM accounts. It’s at the
bottom of the page, called “im-connections”. wiki:
http://wiki.maemo.org/N900_Mission_Control#Set_all_SIP_accounts_to_online_or_offline
pastebin: http://pastebin.com/qAC57E1N
themattreid.com |
7/27/10 11:42 PM
MySQL Connector/Net 6.3.3 (beta 2) has been released MySQL Connector/Net 6.3.3, a new version of the all-managed .NET driver
for MySQL has been released. This is a beta release and is intended to
introduce you to the new features and enhancements we are planning. This
release should not be used in a production environment. It is now
available in source and binary form from
http://dev.mysql.com/downloads/connector/net/6.3.html] and mirror sites
(note that not all mirror sites may be up to date at this point of time
– if you can’t find this version on some mirror, please try
again later or choose another download site.) The new features or
changes in this release are: Visual Studio 2010 RTM support New sql
editor. Create a new file with a .mysql extension to see it in
action Please check the changelog and release notes for more information
What we know may be broken Documentation is not updated yet and is not
integrated into VS 2010. Please let us know what else we broke and how
we can make it better!
www.reggieburnett.com |
7/27/10 10:01 PM
HOWTO screw up launching a free software project Josh Berkus gave a great talk at linux.conf.au 2010 (the CFP for
linux.conf.au 2011 is open until August 7th) entitled “How to
destroy your community” (lwn coverage). It was a simple, patented,
10 step program, finely homed over time to have maximum effect. Each
step is simple and we can all name a dozen companies that have done at
least three of them. Simon Phipps this past week at OSCON talked about
Open Source Continuity in practice – specifically mentioning some
open source software projects that were at Sun but have since been
abandoned by Oracle and different strategies you can put in place to
ensure your software survives, and check lists for software you use to
see if it will survive. So what can you do to not destroy your
community, but ensure you never get one to begin with? Similar to
destroying your community, you can just make it hard: “#1 is to
make the project depend as much as possible on difficult tools.”
#1 A Contributor License Agreement and Copyright Assignment. If you
happen to be in the unfortunate situation of being employed, this means
you get to talk to lawyers. While your employer may well have an
excellent Open Source Contribution Policy that lets you hack on GPL
software on nights and weekends without a problem – if
you’re handing over all the rights to another company –
there gets to be lawyer time. Your 1hr of contribution has now just
ballooned. You’re going to use up resources of your employer (hey,
lawyers are not cheap), it’s going to suck up your work time
talking to them, and if you can get away from this in under several
hours over a few weeks, you’re doing amazingly well –
especially if you work for a large company. If you are the kind of
person with strong moral convictions, this is a non-starter. It is
completely valid to not want to waste your employers’ time and
money for a weekend project. People scratching their own itch, however
small is how free software gets to be so awesome. I think we got this
almost right with OpenStack. If you compare the agreement to the Apache
License, there’s so much common wording it ends up pretty much
saying that you agree you are able to submit things to the project under
the Apache license. This (of course) makes the entire thing pretty
redundant as if people are going to be dishonest about submitting things
under the Apache licnese there’s no reason they’re not going
to be dishonest and sign this too. You could also never make it about
people – just make it about your company. #2 Make it all about the
company, and never about the project People are not going to show up, do
free work for you to make your company big, huge and yourself rich.
People are self serving. They see software they want only a few patches
away, they see software that serves their company only a few patches
away. They see software that is an excellent starting point for
something totally different. I’m not sure why this is down at
number three… it’s possibly the biggest one for danger
signs that you’re going to destroy something that doesn’t
even yet exist… #3 Open Core This pretty much automatically means
that you’re not going to accept certain patches for reasons of
increasing your own company’s short term profit. i.e. software is
no longer judged on technical merits, but rather political ones. There
is enough politics in free software as it is, creating more is not a
feature. So when people ask me about how I think the OpenStack launch
went, I really want people to know how amazing it can be to just not
fuck it up to begin with. Initial damage is very, very hard to ever
undo. The number of Open Source software projects originally coming out
of a company that are long running, have a wide variety of contributors
and survive the original company are much smaller than you think.
PostgreSQL has survived many companies coming and going around it, and
is stronger than ever. MySQL only has a developer community around it
almost in spite of the companies that have shepherded the project. With
Drizzle I think we’ve been doing okay – I think we need to
work on some things, but they’re more generic to teams of people
working on software in general rather than anything to do with a
company. Share this on Facebook Tweet This!
Share this on del.icio.us Digg this! Post on Google Buzz
www.flamingspork.com |
7/27/10 7:40 PM
Does Size or Type Matter? MySQL seems to be happy to convert types for you. Developers are rushed
to complete their project and if the function works they just move on.
But what is the costs of mixing your types? Does it matter if your are
running across a million rows or more? Lets find out. Here is what the
programmers see. mysql> select 1+1; +-----+ | 1+1 | +-----+
| 2 | +-----+ 1 row in set (0.00 sec) mysql> select
"1"+"1"; +---------+ | "1"+"1" |
+---------+ | 2 | +---------+ 1 row in set
(0.00 sec) Benchmark What if we do a thousand simple loops? How
long does the looping itself take? The BENCHMARK() function
executes the expression expr repeatedly count times. It may be
used to time how quickly MySQL processes the expression. The
result value is always 0. mysql> select benchmark(1000000000, 1);
+--------------------------+ | benchmark(1000000000, 1) |
+--------------------------+
|
0 | +--------------------------+ 1 row in set (5.42 sec) mysql>
select benchmark(1000000000, "1" );
+-----------------------------+ | benchmark(1000000000, "1" )
| +-----------------------------+
|
0 | +-----------------------------+ 1 row in set (5.40 sec) So maybe
type doesn’t matter? About five seconds just to loop but the type
didn’t change it. What if we add 1+”1″?
mysql> select benchmark(1000000000, 1+1);
+----------------------------+ | benchmark(1000000000, 1+1) |
+----------------------------+
|
0 | +----------------------------+ 1 row in set (12.65 sec) mysql>
select benchmark(1000000000, 1+"1");
+------------------------------+ | benchmark(1000000000,
1+"1") | +------------------------------+
|
0 | +------------------------------+ 1 row in set (35.58 sec) mysql>
select benchmark(1000000000, "1"+"1");
+--------------------------------+ | benchmark(1000000000,
"1"+"1") | +--------------------------------+
|
0 | +--------------------------------+ 1 row in set (51.59 sec) It looks
like type does matter. But does it always matter? mysql> select
benchmark(1000000000, sum(1+1)); +---------------------------------+ |
benchmark(1000000000, sum(1+1)) | +---------------------------------+
|
0 | +---------------------------------+ 1 row in set (9.69 sec)
mysql> select benchmark(1000000000,
sum("1"+"1"));
+-------------------------------------+ | benchmark(1000000000,
sum("1"+"1")) |
+-------------------------------------+
|
0 | +-------------------------------------+ 1 row in set (9.94 sec)
mysql> select benchmark(1000000000,
sum("1.23456789"+"1.23456789"));
+-------------------------------------------------------+ |
benchmark(1000000000,
sum("1.23456789"+"1.23456789")) |
+-------------------------------------------------------+
|
0 | +-------------------------------------------------------+ 1 row in
set (10.32 sec) So, not all functions are the same. But it looks
like size might matter! mysql> select benchmark(1000000000, 1.1+1.1);
+--------------------------------+ | benchmark(1000000000, 1.1+1.1) |
+--------------------------------+
|
0 | +--------------------------------+ 1 row in set (34.90 sec)
mysql> select benchmark(1000000000, "1.1"+"1.1");
+------------------------------------+ | benchmark(1000000000,
"1.1"+"1.1") |
+------------------------------------+
|
0 | +------------------------------------+ 1 row in set (1 min 15.32
sec) mysql> select benchmark(1000000000,
"1.123456789"+"1.123456789");
+----------------------------------------------------+ |
benchmark(1000000000, "1.123456789"+"1.123456789") |
+----------------------------------------------------+
|
0 | +----------------------------------------------------+ 1 row in set
(1 min 53.32 sec) Sorry. Looks like size does matter. This doesn't
seem logical. mysql> select benchmark(1000000000, 1=1);
+----------------------------+ | benchmark(1000000000, 1=1) |
+----------------------------+
|
0 | +----------------------------+ 1 row in set (12.75 sec) mysql>
select benchmark(1000000000, 1="1");
+------------------------------+ | benchmark(1000000000,
1="1") | +------------------------------+
|
0 | +------------------------------+ 1 row in set (40.78 sec) mysql>
select benchmark(1000000000, 1=true); +-------------------------------+
| benchmark(1000000000, 1=true) | +-------------------------------+
|
0 | +-------------------------------+ 1 row in set (12.73 sec)
mysql> select benchmark(1000000000, 1="true");
+---------------------------------+ | benchmark(1000000000,
1="true") | +---------------------------------+
|
0 | +---------------------------------+ 1 row in set, 65535 warnings (3
min 5.72 sec) mysql> select benchmark(1000000000,
"true"="true");
+--------------------------------------+ | benchmark(1000000000,
"true"="true") |
+--------------------------------------+
|
0 | +--------------------------------------+ 1 row in set (57.25 sec)
Maybe we should CAST our work? mysql> select benchmark(1000000000,
cast("1" as unsigned));
+----------------------------------------------+ | benchmark(1000000000,
cast("1" as unsigned)) |
+----------------------------------------------+
|
0 | +----------------------------------------------+ 1 row in set (32.27
sec) mysql> select benchmark(1000000000, cast("1" as
unsigned) + cast("1" as unsigned));
+----------------------------------------------------------------------+
| benchmark(1000000000, cast("1" as unsigned) +
cast("1" as unsigned)) |
+----------------------------------------------------------------------+
|
0 |
+----------------------------------------------------------------------+
1 row in set (1 min 7.24 sec) Maybe not! Conclusion: Be careful
with your data types. If you are taking user input, do the type
conversion ONCE in your program. Don’t let MySQL do the type
conversions for you. query = “SELECT * FROM table where $INPUT =
1″; could be doing your wrong. References:
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_benchmark
http://dev.mysql.com/doc/refman/5.0/en/numeric-type-overview.html http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html
www.mysqlfanboy.com |
7/27/10 7:08 PM
Online Verification That Master and Slaves are in Sync In October 2008, Baron posted How to Check MySQL Replication Integrity
Continually. Here at Pythian we have developed a method based on that
post, and added “verifying that masters and slaves are in
sync” to our standard battery of tests. We call it
“Continual replication sync checking”. This article will
explain how it works, how to test and make the procedure non-blocking,
benchmarks from the real world, issues we encountered along the way, and
finally Pythian’s procedure of setting up continual replication
sync in a new environment. At the 2010 MySQL User Conference & Expo,
my co-worker Danil Zburivsky did a presentation about this, and you can
get the slides in ODP (Open Office) format or watch the 46-minute video
on YouTube. How it works On the master, mk-table-checksum is run. In
order to make the checksum operation online, we use the modulo and
offset features of mk-table-checksum to checksum only part of the data
at a time. The checksum is run (from cron) on the master and replicates
to the slave. The results are captured in a result table, and a
separate process checks the result table and notifies us of any
discrepancies. Testing resource usage for non-blocking replication sync
checking Each environment has a different data size, different
resources, and a different level of what “intrusive” means.
We start with a baseline of the size of the database, which we get from:
SELECT SUM(INDEX_LENGTH+DATA_LENGTH)/1024/1024/1024 as sizeGb FROM
INFORMATION_SCHEMA.TABLES WHERE TABLE_TYPE='BASE TABLE'; I included
INDEX_LENGTH because the queries that generate the checksum information
may or may not depend on indexes. Note that for InnoDB tables, the
metadata is approximate, so this will not necessarily be the exact size
of the database, but it will be in the right range. In general, running
this query in the INFORMATION_SCHEMA did not consume too many resources,
though in many cases the query took a few minutes to complete. On
databases with several thousand tables, this query can take hours to
complete. Of course there is the caveat that it may take a VERY long
time to run this in your environment, and some folks have reported
crashing (this is VERY rare, I personally have not seen it even with the
high demand client we have had, but others have reported it). So if you
prefer to be on the safe side, you could look at the size of your ibdata
files (whether you are innodb_file_per_table or not) plus the sizes of
all the tables on disk (MyISAM, CSV, etc). That will give you a sense
of how big the data is, although the size of the files on disk also
includes any fragmentation. In practice, I have seen fragmentation
reduce the size of a large database by 40%, so beware that neither of
these methods is perfect. Once we get the data size in gigabytes, we can
come up with a modulo to test. In general, we have found that
mk-table-checksum is quick enough that it is non-blocking when we use a
modulo value that corresponds to approximately 1 Gb of data, and a
chunk-size of 100. We actually started out using a checksum of 1000,
but locking 1000 rows to test a checksum proved to be longer than was
acceptable for clients with lots of transactions — the problem was
not the locking on the master, the problem was that it created a slave
lag on the slaves. We tried using the --sleep option to
mk-table-checksum, which worked, but changing the chunksize to be
smaller caused less slave lag than adding a sleep parameter. Another
issue when using chunk-size is that it requires a numeric (but
non-decimal) index. We ran across clients using VARCHAR(32) as a
PRIMARY key (yes, on InnoDB, and yes, performance was an issue there),
or tables that did not have any indexes at all (such as a logging
table). If mk-table-checksum does not find a suitable index it will
just do a checksum on the whole table. To find tables that would be
problematic, here is an INFORMATION_SCHEMA query that can be run (again,
the usual caveats about INFORMATION_SCHEMA apply): SELECT
CONCAT("SHOW CREATE TABLE ",
table_schema,".",table_name,"\\G") FROM TABLES LEFT
JOIN ( SELECT distinct table_schema,table_name FROM STATISTICS
INNER JOIN COLUMNS USING (table_schema,table_name,column_name)
WHERE (data_type like '%int' or data_type like 'date%' or data_type
like 'time%' or data_type='float') and seq_in_index=1 ) AS
good_for_checksum USING (table_schema,table_name) WHERE
good_for_checksum.table_name IS NULL and table_schema not in
('information_schema'); Frequency of checksum runs The frequency that
we run the checksum is also very flexible, so we take the size and
translate that into a modulo that is “even” in a time-based
way. For example, on a server that reported 113 Gb in size from the
INFORMATION_SCHEMA query above, we set the modulo to 120. The checksum
took 10 minutes and 6 seconds from the time the master started to the
time the slave finished. There was no excessive slave lag caused and
other resource checks showed that this test was acceptable, including
application response time for non-administrative queries. Given a modulo
of 120 that takes about 10 minutes to run and the environment, we
decided to run the checksum 9 times per day (hourly for 9 hours during
off-peak time). This resulted in the entire data set being checked
during a period of just under 2 weeks (120 parts / 9 times per day =
13.333 days). This means that if there is a data discrepancy, it is
discovered within 2 weeks in this particular environment. Though that
is not ideal, it is much better than not discovering data discrepancies
at all, which is how replication currently works. Benchmarks From the
Real World The first row in the table below is the example we just went
through. The subsequent rows of the table are results from some of the
other production environments we run the checksum in. As you can see,
we try to keep the data checked at one time to about 1 Gb. Total Data
SizeModuloTest timeChecksum RateFrequencyPeriod 113 Gb12010 min 6
seconds1.59 Mb / sec9x / day2 weeks 9 Gb2151 seconds8.4 Mb / sec3x /
day1 week 29 Gb219 min 16 seconds2.6 Mb / sec3x / day1 week 70 Gb2128
seconds2650 Mb / sec(data wasfreshly defragmented!)3x / day1 week 5.1
Gb214 min 22 sec0.958 Mb / sec3x / day1 week 314.5 Gb33636 min 3
seconds0.44 Mb / sec16x / day3 weeks In all of these environments, slave
lag was 10 seconds or less at any given point on the slaves. Issues
Encountered Some of the issues we encountered have workarounds, so I
wanted to discuss and explain them here before giving our procedure,
which contains the workarounds. Bug 304 – mk-table-checksum
deletes all prior checksum results in the result table on every run.
There is no fix for this yet, but if you are using mk-table-checksum
only for the procedure described in this article (and in particular are
not using the --resume-replication option), you can comment out the
following code from mk-table-checksum: # Clean out the
replication table entry for this table. if ( (my
$replicate_table = $final_o->get('replicate'))
&& !$final_o->get('explain') ) {
use_repl_db(%args); # USE the proper replicate db my
$del_sql = "DELETE FROM $replicate_table WHERE db=? AND
tbl=?"; MKDEBUG && _d($dbh, $del_sql, $db,
$table->{table}); $dbh->do($del_sql, {}, $db,
$table->{table}); } It is in different places in
different versions, but last I checked, searching for "DELETE
FROM" in mk-table-checksum only matched three lines of code, and it
was pretty clear (due to the inline comment) which block of code to
delete. The block shown above is from lines 5154 – 5161 in
mk-table-checksum changeset 6647. Running the checksum may cause
“statement not safe for replication” errors, especially in
5.1. This is usually OK to ignore, because mk-table-checksum works
specifically because you can run the same command on the master and
slave and get different results. In MySQL 5.1, CSV tables for the
general and slow logs exist by default, even if they are not being
written to, and “Statement is not safe to log in statement
format” errors show up. You will need to redirect stderr and
expect to see those statements in the MySQL error log. Note that
mk-table-checksum works regardless of whether you are using
statement-based, row-based or mixed replication. mk-table-checksum is
not perfect, and sometimes shows false positives and false negatives.
This is a hard to deal with problem, and we encourage making bug reports
when they are found. However, I will note that if mk-table-checksum
finds even one undetected data integrity issue, then it is useful,
because right now there is no other way of detecting issues in an
automated fashion. As more people use mk-table-checksum and can help
the developers figure out how to fix the false positives/false
negatives, I am sure it will be even better. Pythian’s procedure
to set up continual replication sync Check to make sure all tables have
appropriate indexes, as above. If they do not, the nibbling algorithm
can be used, though as a caveat I have not tested nibbling with
mk-table-checksum. Figure out the modulo value based on data size, as
above Decide what database in which to put the tables that
mk-table-checksum uses. We either use our monitoring database or a
database called “maatkit”. Note that it is important to use
a database that actually gets replicated! Get mk-table-checksum and
comment out the lines that always delete from the replicated table, as
above Create and populate the table mk-table-checksum will need for the
modulo value: CREATE TABLE IF NOT EXISTS `checksum_modulo` (
`modulo_offset` smallint(5) unsigned NOT NULL default '0' primary key )
ENGINE=MyISAM DEFAULT CHARSET=utf8; INSERT IGNORE INTO checksum_modulo
(modulo_offset) VALUES (0); Do a test run of mk-table-checksum: perl
mk-table-checksum -u avail --ask-pass --algorithm BIT_XOR \ --replicate
maatkit.checksum --create-replicate-table \ --modulo 120 --chunk-size
100 \ --offset 'modulo_offset FROM maatkit.checksum_modulo' localhost
And update the modulo table: update maatkit.checksum_modulo set
modulo_offset = ((modulo_offset+1)%21); And check the results on the
slave: SELECT * FROM maatkit.checksum WHERE this_crc!=master_crc AND db
not in ('maatkit','mysql'); If the test run came out OK, set up a script
to run via cron such as: # run checksum table 9 times per day, during
off-peak times: 30 0,1,2,3,4,5,21,22,23 * * *
/home/pythian/bin/checksumtbls.sh >>
/home/pythian/logs/checksum.log 2>&1 And the checksumtbls.sh
script looks like: #!/bin/sh DB=maatkit USER=maatkit_user PASS=password
REPLTBL="$DB.checksum" MODULO=120 CHUNKSIZE=100
OFFSETTBL="checksum_modulo" ALGORITHM=BIT_XOR
LOG=/home/pythian/logs/checksum.log /usr/bin/perl
/home/pythian/bin/mk-table-checksum -u $USER -p $PASS \ --modulo $MODULO
\ --algorithm $ALGORITHM --chunk-size $CHUNKSIZE \ --offset
"modulo_offset FROM $DB.$OFFSETTBL" \ --replicate $REPLTBL
--create-replicate-table localhost >> $LOG /usr/bin/mysql -u $USER
-p$PASS -e "update $DB.$OFFSETTBL set
modulo_offset=((modulo_offset+1)%$MODULO)" >> $LOG And of
course, do not forget to periodically check on the slave to see where
the issues are: SELECT * FROM maatkit.checksum WHERE
this_crc!=master_crc AND db not in ('maatkit','mysql'); I hope this
helps; It is extremely important to make sure
www.pythian.com |
7/27/10 5:09 PM
451 CAOS Links 2010.07.27 New projects. Old arguments. And more. Follow 451 CAOS Links live
@caostheory on Twitter and Identi.ca “Tracking the open source
news wires, so you don’t have to.” New projects # Gemini
Mobile Technologies released Hibari, a new open source non-relational
database for big data. # Lockheed Martin launched the Eureka Streams
open source project for enterprise social networking. # Sony Pictures
Imageworks expanded its open source initiative with the release of
OpenColorIO. Old arguments # Kirk Wylie discussed the importance of
natural split in open core , OpenGamma’s approach. # Alan Shimel
offered 10 commandments for open core. Mostly sensible, #6 will ruffle
some feathers though. # Simon Phipps maintained that open source does
not need “monetising”. # Carlo Daffara discussed property
and efficiency as the basis of OSS business models. # Jorg Janke
continued his discussion of various open source business strategies in
relation to Compiere. # Henrik Ingo explained what you can do to help
get rid of open core if you are so inclined. # dotCMS went open core
with the release of version 1.9. # IBM faces EU antitrust investigation
linked to TurboHercules complaint. # The FT reported that IBM is
blaming Microsoft for the EU investigation into its mainframe business
practices. # TechDirt explained how WordPress and Thesis have settled
their differences over themes and the GPL. The best of the rest #
Novell introduced SUSE Gallery for publishing and sharing Linux-based
appliances. # VoltDB released version 1.1 of its open source database.
# EnterpriseDB released Postgres Plus Advanced Server 8.4 and added Rob
Bearden to its board. # SAP has adopted Black Duck’s Suite to
manage the use of open source software in its software development
process. # Oracle provided details of the MySQL Sunday event at Oracle
Open World. # SearchEnterpriseLinux reported that Ubuntu is gaining
ground as a data center OS at the expense of SUSE Linux. # Physorg.com
explained how Georgia Institute of Technology researchers are helping
the US military benefit from OSS. # GENIVI Allianced has reportedly
opted for MeeGo for its in-vehicle infotainment platform.
blogs.the451group.com |
7/27/10 4:58 PM
Implementing SSIS Package Configurations Marcin Policht demonstrates how to leverage SQL Server Integration
Services (SSIS) package variables in order to modify SSIS properties
without directly editing package content.
redir.internet.com |
7/27/10 12:23 AM
IBM DB2 pureScale: The Next Big Thing or a Solution Looking for a Problem? In October 2009, IBM announced that some rather neat technology known as
data sharing would be delivered for the IBM DB2 for Linux, Unix and
Windows product, in an optional facility dubbed pureScale. Julian
Stuhler takes a closer look at this technology and the possible
implications for the entire IBM DB2 world.
redir.internet.com |
7/27/10 12:08 AM
MariaDB and the MySQL Sandbox Tokutek tests its TokuDB Fractal Tree storage engine with multiple MySQL
distributions. We make extensive use of the MySQL Sandbox in our test
automation. We tweaked the regular expressions that match binary
tarball names in the MySQL Sandbox so that MariaDB releases can be
easily loaded by the MySQL Sandbox. These changes can be found in
launchpad lp:~prohaska7/mysql-sandbox/mariadb.
tokutek.com |
7/26/10 8:49 PM
Webinar: What you need to know for a MySQL 5.0 -> 5.1 upgrade IOUG has a free series of three webinars on upgrading MySQL. Each
webinar is an hour long, and it starts with a webinar by me tomorrow at
12 noon Central time (GMT-5) on “Why and How to Upgrade to MySQL
5.1″. The webinar assumes you are upgrading from MySQL 5.0 to
MySQL 5.1, and talks a little bit about the new features, server
variables, and what you need to know when upgrading to MySQL 5.1. The
software used is GoToWebinar (formerly GoToMeeting), so you will need to
install that software. To register, use the links on the IOUG MySQL
Upgrade Webinar Series page. The complete list of webinars in the MySQL
Upgrade Series is: * MySQL 5.1: Why and How to Upgrade Sheeri
Cabral, The Pythian Group Tuesday, July 27, 12:00 p.m. –
1:00 p.m. CT (GMT-5) * MySQL Upgrades With No Downtime Sean
Hull, Heavyweight Internet Group Wednesday, July 28, 12:00 p.m.
– 1:00 p.m. CT (GMT-5) * MySQL Upgrade Best Practices
Matt Yonkovit, Percona Thursday, July 29, 12:00 p.m. – 1:00
p.m. CT (GMT-5) (note, I am not sure if it is free for everyone or just
free for IOUG members; my apologies if it is the latter)
www.pythian.com |
7/26/10 5:02 PM
UNHEX for Integers in MySQL The following is a very, very specific subject matter, but I wish this
blog post had existed when I was looking for the solution to a problem I
was having. Most likely it has no interest to anyone who is not
specifically looking for it, so feel free to stop reading...MySQL has a
few built-in functions for handling binary data. One of them is HEX
which converts any data into hexadecimal representation. The function
which you would expect to do the opposite (as the manual states) is
UNHEX which takes a hexadecimal representation and turns it into
characters.So, if you try to do:SELECT UNHEX(HEX('a'));You get an
"a" back. But if you try that on a integer:SELECT
UNHEX(HEX(1)); You get back the char corresponding to the ASCII value of
1, which is not what was intended.The correct way to do it, and the real
opposite of HEX is the CONV function which converts betweens
bases:SELECT CONV(HEX(1), 16, 10);This time the result is the number
"1" as expected.
blog.shlomoid.com |
7/26/10 2:00 PM
OSCON and OpenStack The past two weeks have been both exciting and extremely busy, first
traveling to Austin, TX for the first OpenStack Design Summit, and then
back home to Portland, OR for The O’Reilly Open Source Conference
(OSCON) and Community Leadership Summit. The events were great in
different ways, and there was some overlap with OpenStack since we
announced it on the first day of OSCON and created quite a bit of buzz
around the conference. I want to comment on a few things that came up
during these two weeks. New Role I’m now focusing on OpenStack
related projects at Rackspace. I’m no longer working on Drizzle,
but I will still be involved in the MySQL and database ecosystems
through future projects and conferences (see you at OpenSQL Camp). I
will also still be working on a couple of Gearman related projects in my
spare time. At OSCON I gave two presentations on Gearman and Drizzle,
you can find the slides here. The Five Steps to Open One question that
came up a few times over the past couple weeks is what the term
“Open” means when a business or organization decides to
adopt the open source philosophy. It turns out this means many different
things to folks, and when an organization decides to go open, they need
to make a decision on how open they are willing to be. Here are the
various layers we’ve seen over the years: Open API –
You’ve decided to take the first step to being open and released a
well documented API to work with your web service or project. Everything
behind the API is still a black-box though. Open Core – Beyond the
APIs, you’ve decided to release part of the code open source, but
you still keep some of the bits proprietary in an attempt to keep a
competitive advantage. This is a hot debate lately on whether it is a
viable Open Source business model. Open Source – You’ve
decided keeping some code proprietary doesn’t help, and actually
even hurts your project or adoption. You put all of the code out in the
open for everyone to see. While everyone can see all of the source code,
there still isn’t a whole lot of interaction going on. Open
Development – Putting the source code out wasn’t enough. You
want to enable users and external developers to be able to file bugs,
submit patches, and track the development process to see what to expect
next. This usually involves running your project on a public project
site such as github or Launchpad. Open Decision Making –
You’ve postponed the inevitable for long enough. Feature requests
and bug reports are pouring in, and the community wants to have a say in
what gets prioritized. Should we focus only on stability? Performance?
New features? Porting to mobile platforms? Let the community decided the
direction of the project. There have been examples of success for
organizations who have stopped at each of these steps. Given the proper
environment, any can work. My preference is to work on projects that are
fully open, where company and organizational boundaries do not exist
between developers and users. I’m thrilled to say that we’ve
gone all in with OpenStack. We’re hosted on Launchpad and have a
governance structure that allows all parties within the community to
have a say in the future of the project. Preventing Vendor Lock-in
During the Cloud Summit at OSCON, there was a debate titled: “Are
Open APIs Enough to Prevent Lock-in?”. Most folks came to the
conclusion that the answer is “no,” and I agree. While I
feel open APIs are necessary, they are by no means sufficient. Even if a
project is open source and allows for open development, it probably will
not prevent vendor lock-in. The key is to provide some incentive for
vendors to adopt and invest resources within a project. Much like
customers don’t want vendor lock-in when choosing a platform,
vendors do not want project or feature lock-in when choosing the
software to power their business. Each vendor who chooses to participate
must have the ability to voice their opinion on the direction of APIs,
features, and other project priorities. This is why it is critical that
any open source project must take all the steps described above to give
the project a chance of being adopted and becoming the de facto
standard. There is of course no guarantee that adoption and prevention
of vendor lock-in will happen, but I see them as necessary steps. This
is another area where OpenStack has done the correct thing. We are
planning on having another developer summit in November, and then once
every six months after that time. All design discussions and decision
making will happen in public forums such as the mailing list and IRC. We
want all participants in the community to have a chance to respond to
topics being discussed, and we believe the more we have, the more
successful the project will be. Having many voices allows the project to
be more applicable to different environments. For example, Rackspace and
NASA have different requirements for their compute architectures, but
they also share many components as well. Through open participation we
can ensure all needs are accounted for. Much like the LAMP stack has
powered universities, governments, and competing business, we hope
OpenStack can do the same. Contributor License Agreement (CLA) During
the past couple of weeks a few folks asked what the CLA was all about.
When the foundations of OpenStack were forming, the requirement of
having a CLA came up from the legal side. Having been involved with open
source projects that had very invasive CLAs, initially I had quite a bit
of concern. The CLA is actually quite innocuous, and it does NOT require
assignment or dual-ownership of copyright. You are the sole owner of
code you contribute. For all intents and purposes it is a signed version
of the Apache 2.0 license, the CLA just makes these terms more explicit.
The CLA is handled through digital signatures, so no papers, pens, or
faxing is required. Get Involved! Expect to see more posts on my blog
related to OpenStack topics. If you would like to get involved, you can
join the IRC channel (#openstack on irc.freenode.net), join the mailing
list, or start contributing code! There are even jobs around OpenStack
popping up already!
oddments.org |
7/26/10 8:33 AM
JavaOne News Update 1 An update on some recent News on JavaOne 2010. As you know JavaOne San
Francisco is Sep 19-23, 2010. The Official page has links to the
Registration Page and the Online Catalog. News updates include:
• A surprisingly useful & manageable Catalog-as-tweets via
@javaoneconf • Availability of Schedule Builder (post) •
Open enrollment in Java University (post) • Announcement of dates
for JavaOne Brazil and JavaOne China (post). • The day before
there is a MySQL Sunday! • And, the Duke Awards submissions page
seems to still be active. Also, this year will be the 15th
anniversary for Java, and the 5th for GlassFish. Don't know if
there will be a BDay party for Java; still hoping we can put something
together for GlassFish, we will see! More related news are tagged JavaOne.
blogs.sun.com |
7/26/10 1:46 AM
Why don't you use X? Sometimes I am asked why don't I use X instead of official MySQL. The
answer is simple. I like to use it because I have been using it, the
MySQL development team (including InnoDB) has done great work this year
and because change is expensive. The cost of change includes the cost of
evaluating the alternatives and the cost of deploying them. The cost of
change also includes the features I won't work on because I am doing an
evaluation. I also use it because the quality of new 5.1 releases has
been very high this year. I know because I test them and some of the
alternatives.My initial evaluation criteria are simple. I don't like
compiler or valgrind warnings. The alternative should not introduce new
ones. I like regression tests. The alternative should not disable or
fail existing tests. If the existing test is somewhat bogus, then it
should be fixed. I love buildbot as done by MariaDB, fixes in official
MySQL to reduce compiler warnings and all of the work done by Drizzle to
not tolerate compiler warnings. When the alternative adds new features
it must add new regression tests (hooray for status_user.test in
MariaDB).I spent a lot of time debugging valgrind warnings that occur in
MySQL 5.1.47. All of them were bogus and I think future versions of
MySQL will prevent these. That is good news as I prefer to not repeat
that effort the next time I upgrade to a new MySQL release.Percona and
MariaDB confront the same issues and more when considering code to
incorporate. In addition to what I wrote about above, they must review
patches to make sure the code is reasonable. There is a lot of duplicate
effort done by groups that patch or fork MySQL. I wish this weren't the
case. If the external fork & patch effort is consolidated around
MariaDB, then we can reduce some of this.
mysqlha.blogspot.com |
7/25/10 8:18 PM
MySQL versions at Wikipedia More of information about how we handle database stuff can be found in
some of my talks. Lately I hear people questioning database software
choices we made at Wikipedia, and I’d like to point out,
that… Wikipedia database infrastructure needs are remarkably
boring. We have worked a lot on having majority of site workload
handled by edge HTTP caches, and some of most database intensive code
(our parsing pipeline) is well absorbed by just 160G of memcached arena,
residing on our web servers. Also, major issue with our databases is
finding the right balance between storage space (even though text is
stored in ‘external store’, which is just set of machines
with lots of large slow disks) – we store information about every
revision, every link, every edit – and available I/O performance
per dollar for that kind of space needed. As a platform of choice we
use X4240s (I advertised it before) – 16 SAS disks in compact 2u
package. There’s relatively small hot area (we have 1:10 RAM/DB
ratio), and quite a long tail of various stuff we have to serve. The
whole database is just six shards, each getting up to 20k read queries a
second (single server can handle that), and few hundred writes (binlog
is under 100k/s – nothing too fancy). We have overprovisioned some
hardware for slightly higher availability – we don’t have
always available on-site resources – the slightly humorous logic
is we need four servers, in case one goes down, another will be
accidentally brought down by fixing person, then you got one to use as a
source of recovery and remaining one to run the site. Application
doesn’t have too many really expensive queries, and those
aren’t the biggest share of our workload. Database by itself is
minor part of where application code spends time (looking at profiling
now – only 6% of busy application time is inside database,
memcached is even less, Lucene is way up with 11%). This is remarkably
good shape to be at, and it is much better than what we used to have
when we had to deal with insane (“explosive”) growth. I am
sure, pretty much anything deployed now (even sqlite!) will work just
fine, but what we used has been created during bad times. Bad times
didn’t mean that everything was absolutely overloaded, it was more
that it could get overloaded very soon, if we don’t take
appropriate measures, and our fundraisers were much tinier back then. We
were using 6-disk RAID-0 boxes to be able to sustain good performance
and have required disk space at the same time (or of course, go
expensive SAN route). While the mainstream MySQL development with its
leadership back then was headed towards implementing all sorts of
features that didn’t mean anything to our environment (and from
various discussions I had with lots of people, many many other web
environments): utf8 support that didn’t support Unicode Prepared
Statements that don’t really make much sense in PHP/C environments
unoptimized subqueries, that allow people to write shitty performing
queries later in 5.0 – views, stored routines, triggers
etc… … nobody was really looking at MySQL performance at
that time, and it could have insane performance regressions
(“nobody runs these things anyway”, like ‘SHOW
STATUS’) and a forest full of low hanging fruits. From operations
perspective it wasn’t perfect either – replication
didn’t survive crashes, crash recovery was taking forever, etc.
Thats when Google released their set of patches for 4.0, which
immediately provided incredible amount of fixes (thats what I wrote
about it back then). To highlight some of introduced changes:
Crash-safe replication (replication position is stored inside InnoDB
along with transaction state) – this allowed to run slaves with
innodb log flushing turned off on slaves and having consistent recovery,
vanilla MySQL doesn’t have that yet, Percona added this to XtraDB
at some point in time Guaranteed InnoDB concurrency slot for replication
thread – however loaded the server is, replication does not get
queued outside and can proceed. This allowed us to have way more load
pointed towards MySQL. This is now part of 5.1 Multiple read-ahead and
write-behind threads – again, allowed to bypass certain
bottlenecks, such as read-ahead slots (though apparently it is wiser
just to turn off read-ahead entirely) – now part of InnoDB Plugin
Multiple reserved SUPER connections – during overloads systems
were way more manageable Running these changes live have been
especially successful (and that was way before Mark/Google released
their 5.0 patch set which was then taken in parts by
OurDelta/Percona/etc) – and I spent quite some time trying to
evangelize these changes to MySQL developers (as I would have loved to
see that deployed at our customers, way less work then!). Unfortunately,
nobody cared, so running reliable and fast replication environments with
mainline MySQL didn’t happen (now one has to use either XtraDB or
FB build). So, I did some merging work, added few other small fixes and
ended up with our 4.0.40 build (also known as four-oh-forever), which
still runs half of shards today. It has sufficient in-memory performance
for us, it can utilize our disk capacity fully, and it doesn’t
have crash history (I used to tell about two 4.0 servers, both whitebox
raid0 machines, having unbroken replication session for two years). By
todays standards it already misses few things (I miss fast crash
recovery mostly, after last full power outage in a datacenter ;-)
– and developers would love to abuse SQL features (hehe, recently
a read-only subquery locked all rows because of a bug :-) I’m way
more conservative when it comes to using certain features live, as when
working at MySQL Support I could see all the ways those features break
for people, and we used to joke (this one was about RBR :): Which is the
stable build for feature X? Next one! Anyway, even knowing that stuff
breaks in one way or another, I was running a 5.1 upgrade project,
mostly because of peer pressure (“4.0 haha!”, even though
that 4.0 is more modern from operations standpoint). As MediaWiki is
open-source project, used by many, we already engineer it for wide range
of databases – we support MySQL 4.0, we support MySQL
6.0-whatever-is-in-future, and there’s some support for different
vendor DBMSes (at various stages – PG, Oracle, MS SQL, etc)
– so we can be sure that it works relatively fine on newer
versions. Upgrade in short: Dump schema/data Load schema on 5.1
instance Adjust schema, as we can do it, set all varchar to varbinary to
maintain 4.0 behavior Load data on 5.1 instance Fix MySQL to replicate
from 4.0 (stupid breakage for nothing) Switch master to 5.1 instance We
had some 5.0 and 5.1 replicas running for a while to detect any issues,
and as there weren’t too many, the switch could be nearly
immediate (English Wikipedia was converted 4.0->5.1 over a weekend).
I had an engineering effort before to merge Google 5.0 patches into
later than 5.0.37 tree, but eventually Mark left Google for Facebook and
“Google patch” was abandoned, long live the Facebook patch!
:) At first FB-internal efforts were to get the 5.0 environment working
properly, so 5.1 development was a bit on hold. At that time I
cherry-picked some of Percona’s patch work (mostly to get
transactional replication for 5.1, as well as fast crash recovery)
– and started deploying this new branch. Of course, once Facebook
development focus switched to 5.1, maintaining separate branch is
becoming less needed – my plan for the future is getting FB build
deployed across all shards. The beauty of FB-build is that development
team is remarkably close to operations (and operations team is close to
development), and there is lots of focus on making it do the right stuff
(make sure you follow mysql@facebook page). The visibility of systems
(PMP!) we have at Facebook can be transformed into code fixes nearly
instantly, especially when compared with development cycles outside.
I’m sure some of those changes will trickle to other trees
eventually, but we have those changes in FB builds already here, and
they are state of the art of MySQL performance/operations engineering,
while maintain great code quality. Yes, at Wikipedia we run a mix of
really fresh and also quite old/frozen software (there will be
unification, of course), but…. it runs fine. It isn’t as
fascinating anymore as years ago, but it allows not paying any attention
for years. Which is good, right? Oh, and of course, there’s full
data on-disk compatibility with standard InnoDB builds, in case anyone
really wants to roll back or switch to the next-best-fork.
mituzas.lt |
7/25/10 7:36 PM
MySQL Server Protocol Bug A few months ago I wrote a tool that verified MySQL and Drizzle protocol
compatibility, along with testing for all sorts of edge cases. In
analyzing protocol command interactions in mysqld, I found that the
MySQL server will happily read an infinite amount of data if you exceed
the maximum packet size while using a special sequence of protocol
packets. The reasoning behind this behavior is so that the server can be
polite and flush your data before sending a “max packet
exceeded” error message, but perhaps there should be a limit to
one’s politeness. What’s more interesting is that you can do
this during the client handshake packet without authorization, so anyone
could do this to any open MySQL server. The appropriate thing to do here
would be to set some maximum limit of data to read and force a
connection close when it is reached, otherwise your bandwidth and CPU
could be consumed (essentially a DoS attack). This portion of code was
ripped out entirely in Drizzle, so there are no risks there. I submitted
this as a bug to MySQL and MariaDB back in February and they both have
patches available to fix this as well. You can find the bug here and a
patch here. If you have publicly accessible MySQL or MariaDB servers,
you probably want to upgrade binaries or patch this.
oddments.org |
7/25/10 1:47 AM
Caching could be the last thing you want to do I recently had a run-in with very popular PHP ecommerce package which
makes me want to voice a recurring mistake I see in how many web
applications are architected. What is that mistake? The ecommerce
package I was working with depended on caching. Out of the box it
couldn’t serve 10 pages/second unless I enabled some features [...]
www.mysqlperformanceblog.com |
7/24/10 7:39 PM
Using MySQL Partitioning Instead of MERGE Tables One common question I get is how to use partitioning instead of MERGE
tables. The process I use involves using stored procedures to create
and drop partitions. This article will go over the stored procedures I
use; special thanks to Roland Bouman for taking a look and giving great
feedback to optimize this process. First, a simple table, not
partitioned (yet): use test; DROP TABLE IF EXISTS my_part; CREATE TABLE
IF NOT EXISTS my_part ( id int NOT NULL, creationDate datetime NOT
NULL, PRIMARY KEY (id,creationDate) ) ENGINE=InnoDB; In real, life
there is more to the table than just id and creationDate. The most
important part is that the partitioned field(s) need to be part of the
primary key. Now, add the partition definition. This can be done in
the CREATE statement, but I have found that it is easier for me to think
about what fields and indexes I want first, and then worry about
partitioning, so I naturally gravitate towards defining the fields and
indexes first and then altering the table to add partitioning. This
also helps me think about how to modify an existing table to add
partitioning. ALTER TABLE my_part PARTITION BY RANGE
(TO_DAYS(creationDate)) ( partition 2010_07_01 values less than
(to_days('2010-07-02')), partition 2010_07_02 values less than
(to_days('2010-07-03')), partition 2010_07_03 values less than
(to_days('2010-07-04')), partition 2010_07_04 values less than
(to_days('2010-07-05')) ); This makes it pretty clear what is happening
— the idea is to give the partition names actual dates that they
hold, so that it is easy to see what partitions need to be added and
deleted. Deleting partitions I find that making stored procedures makes
things easy….so I will define a procedure called partition_drop
to drop partitions. The partition_drop stored procedure takes in a
table name and schema name to drop partitions from, and a date to delete
up through and including that date. Here’s the procedure:
DELIMITER || DROP PROCEDURE IF EXISTS partition_drop || CREATE
PROCEDURE partition_drop (IN through_date date, IN tbl varchar(64), IN
db varchar(64)) BEGIN DECLARE delete_me varchar(64); DECLARE notfound
BOOL DEFAULT FALSE; DECLARE pname CURSOR FOR SELECT PARTITION_NAME FROM
INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME=tbl AND TABLE_SCHEMA=db
AND DATE(PARTITION_NAME)!= 0 AND DATE(PARTITION_NAME) IS NOT NULL AND
DATE(PARTITION_NAME)<=through_date; DECLARE CONTINUE HANDLER FOR NOT
FOUND SET notfound:=TRUE; OPEN pname; cursor_loop: LOOP FETCH pname
INTO delete_me; IF notfound THEN LEAVE cursor_loop; END IF; SET
@alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl,"
DROP PARTITION ",delete_me); # sanity check commented out for
production use # SELECT @alter_stmt; PREPARE stmt_alter FROM
@alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE stmt_alter; END
LOOP; CLOSE pname; END || DELIMITER ; Go ahead and run CALL
partition_drop('2010-07-02','my_part','test'); to verify that SHOW
CREATE TABLE my_part; shows that the desired partitions have been
dropped. Adding partitions Adding partitions is similar to deleting
partitions — using a stored procedure that takes in the date to
add partitions up to that date. It will not try to add so many
partitions that the table will have more than 1024, and it won’t
add any partitions that already exist. DELIMITER || DROP PROCEDURE IF
EXISTS partition_add || CREATE PROCEDURE partition_add (IN through_date
date, IN tbl varchar(64), IN db varchar(64)) BEGIN DECLARE add_me
char(10); DECLARE max_new_parts,add_cnt smallint unsigned default 0;
SELECT 1024-COUNT(*) AS max_new_parts, SUM(CASE WHEN
DATE(PARTITION_NAME)>=through_date then 1 else 0 END) INTO
max_new_parts, add_cnt FROM INFORMATION_SCHEMA.PARTITIONS WHERE
TABLE_SCHEMA = db AND TABLE_NAME = tbl; IF add_cnt=0 THEN BEGIN SELECT
MAX(DATE(PARTITION_NAME)) INTO add_me FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME=tbl and TABLE_SCHEMA=db AND
DATE(PARTITION_NAME)<through_date; # to do: declare handler for
exceptions here IF DATEDIFF(through_date,add_me)+1 < max_new_parts
THEN BEGIN WHILE add_me<through_date do BEGIN SET
add_me:=DATE_FORMAT(add_me + INTERVAL 1 DAY,"%Y_%m_%d"); SET
@alter_stmt:=CONCAT("ALTER TABLE ",db,".",tbl,"
ADD PARTITION (PARTITION ",add_me," VALUES LESS THAN
(TO_DAYS('",add_me+INTERVAL 1 DAY, "')))" ); PREPARE
stmt_alter FROM @alter_stmt; EXECUTE stmt_alter; DEALLOCATE PREPARE
stmt_alter; END; END WHILE; END; END IF; END; END IF; END || DELIMITER
; Here’s how to call that stored procedure: CALL partition_add
('2010_07_10', 'my_part','test'); Caveat: This stored procedure will
only add in partitions from the maximum partition name/date until the
date you add it in, it will not fill in any gaps. However, it is
possible to add in another input parameter to be the “start
date” and change the stored procedure to add the partition if it
does not exist, from the start date through the end date. Note: It was
pointed out by Roland Bouman that it would be better to change both the
add and drop stored procedures to do one ALTER TABLE instead of
sequential ones. He wrote: It’s probably better to generate a
single statement to drop / add all partitions. So the general pattern
would be: - generate and concatenate lines for each partition - one
single sequence of prepare, execute, deallocate to execute one DDL
statement. For the drop partition procedure, this approach would allow
you to do away with the cursor. You can simply use GROUP_CONCAT in a
single SELECT…INTO statement to generate the entire DDL
statement. Note: Roland also mentioned that these stored procedures
could be generalized to use any interval, for example if each table
partition held 7 days. The only limitation is that to get the optimal
partition performance with dates, use either TO_DAYS(date_field) or
YEAR(date_field) as the partitionining function. Putting it all together
The procedure: daily I would run the following in MySQL, to keep only
the previous “x” days: CALL
partition_drop('CURRENT_DATE()-INTERVAL x DAY',
'tbl_name','schema_name'); To add new partitions, I would do: CALL
partition_add('CURRENT_DATE()-INTERVAL 32 DAY',
'tbl_name','schema_name'); It is run daily just in case; the side
benefit is that it will catch any gaps if you do not modify the stored
procedure to include a start date for when to add. Adding a partition
does not take a long time, because there’s no data to reorganize.
I would make a daily MySQL event, and then everything is stored nicely
in the database, and backed up with the database. Others may choose to
run the stored procedures in a cron script or scheduled task (Windows),
but that requires a password to be stored somewhere — either in
the script, somewhere the script can read, or in a config file such as .my.cnf.
www.pythian.com |
7/24/10 3:19 AM
Going Mobile: Data Management Solutions for Mobile Devices Crunched by the economy and squeezed by competition, businesses and
consumers alike are turning to mobile computing. This report wraps up
our overview of the top contenders in the mobile database market with a
look at Sybase SQL Anywhere and SQLite.
redir.internet.com |
7/24/10 12:23 AM
Gain the Competitive Advantage: Data Management Solutions for Mobile Devices Today's workforce and consumers have become increasingly mobile.
Businesses crunched by the economy or squeezed by competition look to
mobile computing to gain a competitive advantage. This report examines
the top contenders in the mobile database market space, starting with
SQL Server Compact, Oracle Database Lite and IBM DB2 Everyplace.
redir.internet.com |
7/24/10 12:08 AM
Dynamic Data Driven Variable Hierarchical Structures in SQL SQL can support the creation of variable hierarchical structures in a
number of differently controlled operations that can be combined.
Michael M. David describes a powerful and flexible automatic method of
dynamic structure generation.
redir.internet.com |
7/23/10 11:53 PM
Selling the Business Intelligence Implementation Increasingly more organizations are turning to Business Intelligence
professionals to provide historical, current, and predictive views of
business operations on their critical information. As a Business
Intelligence consultant, your challenge is not usually the sourcing or
delivery of the information, but how that information is used going
forward once you have left the organization.
redir.internet.com |
7/23/10 11:38 PM
Log Buffer #196, A Carnival of the Vanities for DBAs Welcome to Log Buffer, the weekly roundup of database industry news.
For your reading pleasure this week we have Log Buffer #196: Charles
Hooper blogs about an in-depth investigation on what can cause Oracle to
ignore a hint. Doug Burns reminds his readers that there are only two
weeks left to submit papers for UKOUG. The deadline is Aug. 2. A while
back Greg Rahn at Structured Data blog thought that the best way to get
results out of Exadata is by changing your application to get the most
out of Exadata. He was very happy to see that Pythian thinks the same.
On the subject of Exadata and data warehousing, Greg posts this week on
the core performance fundamentals of Oracle Data Warehousing – set
processing vs row processing. Jonathan Lewis links to instructions and
explanations on how to switch to a different UNDO tablespace. It is
trickier than it sounds and Jonathan provided additional traps to watch
out for. Jonathan also continues his fragmentation series with an
explanation of table fragmentation and its causes. Alex Fatkulin
explains about ASM mirroring and disk partnership and why you may have
less redundancy than you thought. On the same subject, Jeremy Schneider
of Ardent Performance blog explains about hot disks, raid and what it
means for ASM mirroring. Back to blogging after a recent trip to
TechInsights 2010, Edwin Sarmiento answers questions on what needs to be
done as part of the installation of a SQL Server 2008 R2 Failover
Cluster on Windows Server 2008 R2. A number of things related to
Windows Clustering need to be considered. Willie Favero introduces the
“IBM zEnterprise System”, on his blog Getting the Most out
of DB2 for z/OS and System z. On Join-fu! the Art of SQL blog, Jay Pipes
talks about getting started developing Nova on Linux, as he’s
involved in a new OpenStack project. Peter Zaitsev, on MySQL Performance
Blog, posts about estimating replication capacity so that replication
load can be dealt with before slave is unable to catch up. Paul Randal
publishes his survey results around the purchase and use of SSIDs. And,
if you happen to be attending Oracle OpenWorld, register before July 30
to take advantage of early bird rates.
www.pythian.com |
7/23/10 10:09 PM
Database Scalability Patterns - OSCon 2010 Howdy folks, slides are up for my talk, "Database Scalability
Patterns", which I gave this week at OSCon 2010. You can get them
from the OSCon page, from slideshare, or just watch it below Database
Scalability PatternsView more presentations from Robert Treat.
www.xzilla.net |
7/23/10 8:42 PM
Building MariaDB with the InnoDB plugin This post was inspired by a couple events. But I won't explain them
other than to say I think there have been too many subjective comments
(or FUD) about quality. This is an attempt to quantify whether the grass
is greener on the other side.Today I tried to build MariaDB with the
InnoDB plugin. I was told his is now supported. The last time I checked
XtraDB replaced the InnoDB plugin in MariaDB. MPAB had a reasonable
reason for doing this as they don't want to test both the InnoDB plugin
and XtraDB. But I prefer choice, despite the many great features in
XtraDB.First I tried using the 5.3 release. That failed fast. I prefer
fast failures over obscure ones:./configure --enable-thread-safe-client
--with-plugins=partition,csv,blackhole,myisam,heap,innodb_plugin
--without-plugin-innobase --with-fast-mutexes --with-extra-charsets=all
--with-debug C_EXTRA_FLAGS="-fno-omit-frame-pointer
-fno-strict-aliasing -Wall"...configure: error: unknown plugin:
innodb_pluginThen I tried the 5.2 MariaDB release. This time the
configure command worked. After running make, both XtraDB and the InnoDB
were compiled. Time to try a test. There were no failures! Alas, all
tests were skipped../mysql-test-run.pl
--suite=innodb_plugin...innodb_plugin.innodb
[ skipped ] No innodb
supportinnodb_plugin.innodb-analyze
[ skipped ] No innodb supportinnodb_plugin.innodb-autoinc
[ skipped ] No innodb
supportinnodb_plugin.innodb-autoinc-44030 [ skipped
] No innodb supportinnodb_plugin.innodb-consistent
[ skipped ] No innodb
supportinnodb_plugin.innodb-index
[ skipped ] No innodb
supportinnodb_plugin.innodb-index_ucs2
[ skipped ] No innodb supportinnodb_plugin.innodb-lock
[ skipped ]
No innodb supportinnodb_plugin.innodb-replace
[ skipped ] No innodb
supportinnodb_plugin.innodb-semi-consistent [ skipped ]
No innodb supportinnodb_plugin.innodb-timeout
[ skipped ] No innodb
supportinnodb_plugin.innodb-use-sys-malloc [ skipped
] No innodb supportinnodb_plugin.innodb-zip
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug21704
[ skipped ] No innodb supportinnodb_plugin.innodb_bug34053
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug34300
[ skipped ] No innodb supportinnodb_plugin.innodb_bug35220
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug36169
[ skipped ] No innodb supportinnodb_plugin.innodb_bug36172
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug38231
[ skipped ] No innodb supportinnodb_plugin.innodb_bug39438
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug40360
[ skipped ] No innodb supportinnodb_plugin.innodb_bug40565
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug41904
[ skipped ] No innodb supportinnodb_plugin.innodb_bug42101
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug42101-nonzero [ skipped ]
No innodb supportinnodb_plugin.innodb_bug44032
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug44369
[ skipped ] No innodb supportinnodb_plugin.innodb_bug44571
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug45357
[ skipped ] No innodb supportinnodb_plugin.innodb_bug46000
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug46676
[ skipped ] No innodb supportinnodb_plugin.innodb_bug47167
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug47621
[ skipped ] No innodb supportinnodb_plugin.innodb_bug47622
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug47777
[ skipped ] No innodb supportinnodb_plugin.innodb_bug51378
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug51920
[ skipped ] No innodb supportinnodb_plugin.innodb_bug52663
[ skipped ] No innodb
supportinnodb_plugin.innodb_bug52745
[ skipped ] No innodb
supportinnodb_plugin.innodb_file_format [
skipped ] No innodb supportinnodb_plugin.innodb_information_schema
[ skipped ] No innodb supportinnodb_plugin.innodb_trx_weight
[ skipped ] No innodb
supportThen I checked for compiler warnings. I really dislike compiler
warnings. MySQL has recently done a lot of work to remove them from 5.1
(thanks Davi). I think all of this work was considered one bug, but it
was a lot of work and will make MySQL better. They have also begun to do
some builds with Werror. See bug 53445 for all of the details.After
compiling storage engines with -Wall, there are no warnings for the
official InnoDB plugin (storage/innodb_plugin). There are warnings for
XtraDB (storage/xtradb):btr/btr0btr.c:2871: warning: null argument where
non-null required (argument 1)btr/btr0cur.c:1841: warning: null argument
where non-null required (argument 2)btr/btr0cur.c:1860: warning: null
argument where non-null required (argument 1)btr/btr0cur.c:1967:
warning: null argument where non-null required (argument
1)fil/fil0fil.c:3106: warning: pointer targets in passing argument 2 of
‘dict_table_get_index_on_name’ differ in
signednessibuf/ibuf0ibuf.c:775: warning: null argument where non-null
required (argument 1)ibuf/ibuf0ibuf.c:950: warning: null argument where
non-null required (argument 1)os/os0file.c:4194: warning: pointer
targets in assignment differ in signednessI want to compare the results
from running all regression tests with valgrind. But that might take
some time. I am able to run all InnoDB tests without valgrind warnings
using the Facebook-patched MySQL 5.1.47. That required a few small
changes that are likely in the recent 5.1.49 release. Have all
regression tests been run under valgrind for MariaDB using either the
InnoDB or XtraDB plugin?Stay tuned for part 2.
mysqlha.blogspot.com |
7/23/10 6:55 PM
MyTOP 1.9 released MyTOP is a console-based (non-gui) tool for monitoring the threads and
overall performance of a MySQL created by Jeremy Zawodny. For months
now I have been adding updates to mytop. When I started using version
1.6 it worked but didn’t return some data fields. After fixing
these bugs I began to ideas for improvements. Here is a quick list of
what I have done. New ‘!’ command to force fix replication
errors. Display rows sorted per second. Added new ‘Cmd’
column to display the state of the query along with the statement. New
‘M’ command to change the [Mode] to
‘status’. This was in the code but covered up by
another command. If you find any bugs or would like to see your own
updates added please email me at mark @ grennan.com. Enjoy!
www.mysqlfanboy.com |
7/23/10 5:57 PM
How To: Export/Import Large MySQL Database MySQL is frequently referred to as a database for Web applications.
Partially it is really so, because MySQL became popular owing to its
simplicity, high speed, and bounding with PHP. Developers of small Web
projects often choose MySQL as a back end of their sites. Does this mean
that MySQL can be used only for small databases? Not at all. There are
lots of databases size of data in which is measured in gigabytes.
Besides MySQL servers are frequently clustered to increase their
performance. When a DBAs work with large amounts of data, they
frequently have to make backup copies correctly and effectively, i. e.
to export MySQL databases to SQL (or MySQL backup). It is exteremely
important to import MySQL database from SQL correctly is when restoring
a corrupted database and when migrating a database from one server to
another. What should be taken into account when exporting a large MySQL
database? Making a backup copy of a large database takes a lot of time.
During this time some user can try to modify data in this database. But
we want to get an all-of-a-piece database snapshot. For different tables
this can be done in different ways: for InnoDB tables a separate
transaction should be started MyISAM tables should be locked by FLUSH
TABLES WITH READ LOCK Except taking a lot of time export and import of
a MySQL database implies transferring a large amount of data between
client and server. The most effective way of reducing the amount of data
transferred through the net and the amount of the used disk space is
compression. All points mentioned above are taken into account in the
utility for making database backups included in the set of tools
available in dbForge Studio for MySQL. The rest of the article is
written referring to this utility. Step-by-step MySQL backup procedure
1. Set compression for the connection Using compression for a connection
allows to reduce net traffic owing to strings compression. The Use
Compression option is available on the Advanced page of the Database
Connection Properties dialog. But if the amount of string data per
record is not large, you won’t gain much this way. 2. Choose a
database for export and open Database Backup Wizard In the Database
Explorer tree choose the nod of the needed database or connection.
Choose Backup Database from the pop-up menu. On the image below we show
the main settings of MySQL backup. Pay your attention to the fact that
we are using compression to reduce the disk space usage. Besides using
compression allows to add comments to the backup. 3. Set options to
create an all-of-a-piece database snapshot Earlier in this article we
wrote about the necessity of avoiding data corruption while creating a
database backup. There are options to achieve this, and they are marked
red on the image below: 4. Make a backup of your MySQL database After
performing all necessary settings press the Backup button and wait until
the application finishes export. During the operation you will be able
to see the progress of backup creation and its stages. Step-by-step
MySQL import procedure So, we’ve performed database export and
compressed the results into a ZIP-archive. And now we have to import
this database on the new server. Will we have to decompress a large
script and try to execute it manually? Not at all. Database Import can
be performed neatly and easily with the help of Database Restore Wizard
1. Choose a database to import and open Database Restore Wizard In the
Database Explorer tree choose the nod of the needed database or
connection. Choose Restore Database from the pop-up menu. Choose the
backup file in the window that opened. After you’ve chosen the
file you should see approximately the same as shown on the image below:
The program finds the *.sql file in the archive automatically and shows
the comment we’ve added when creating the backup. Now we are ready
to import our MySQL database. 2. Import the database After pressing the
Restore button the program will automatically restore the database from
the backup. Conclusion In this article we’ve reviewed some aspects
of making backups of MySQL databases and illustrated capabilities of
dbForge Studio for MySQL regarding export/import. You can download
dbForge Studio for MySQL here.
www.devart.com |
7/23/10 5:11 PM
SQL Server Integration Services 2008(SSIS) and MySQL In my previous article I had created linked server, to access and query
a database. However, it is just the basic step to try and bring two
different RDBMS systems linked together. In this article I will discuss
about creating a SSIS package to transfer data from MS SQL Server 2008
to MySQL 5.5 Prerequisite:
___________________________________________________ SQL Server Business
Intelligence Development Studio MySQL Connector Net 5.2.7 You can
download this package from http://www.mysql.com/downloads/connector/net/
Connector/Net is a fully-managed ADO.NET driver for MySQL. MySQL
Connector/ODBC 5.1 (Which is already installed, in our previous article)
http://www.mysql.com/downloads/connector/odbc/ Brief about SSIS
___________________________________________________ SSIS provides a
graphical front end to design control flow data processing logic.
Once designed, these ‘packages’ are compiled into
‘.dtsx’ packages which can then be distributed and run on
any machine with the SSIS tools installed. Packages contain two main
logic flows, a ‘Control Flow’ which defines a sequence of
logical operations which are processed in sequence. Each step is
completed before the next starts e.g. 1. Empty out work tables in
a database 2. Populate the work tables with data 3. Perform
calculations and update the values in the work table 4. Update
OLAP cubes with the data from the work tables 5. Run reports
against the OLAP cubes. This level of control also allows processing
loops to be defined e.g. For each file in a specified folder, read the
contents of the file and write it into a specified table. The second
main logic flow is the ‘Data Flow’. This allows for
the processing of data at the record level. Data is read from a
‘Data Source’ and passes down a series of ‘Data
Transformations’ to a ‘Data Destination’. These
transformations can be as simple as changing the data type of fields
e.g. varchar(4000) to varchar(2000) or decimal(18,2) to decimal(8,2), or
can be more complex like data merges, joins, pivot tables, multicasts
etc. Each transformation is represented by an icon in the designer and
the icons are linked together to define the logic path. Creating SSIS
package to transfer data from MS SQL Server 2008 to MySQL
___________________________________________________ Follow the below
steps to create a SSIS package for data transfer. Go to START>MS SQL
SERVER 2008>SQL SERVER BUSINESS INTELLIGENCE DEVELOPMENT STUDIO and
Click on File>New>Project Under Business Intelligence Projects,
select Integration Services Project and rename the Project Name. This
will open a Package Designer Screen, where you can start designing the
data flow for your package. Configure Connection Managers for you
package ___________________________________________________ Right Click
in the area where it shows the Connection Manager tab and select New
ADO.Net Connection. You will see the Configuration Manager Editor
window Click New Now you need to define connection manager
configuration for source (i.e. MS SQL Server 2008) in our case. In the
Connection Manager Editor, by default the Provider is set to .Net
Providers\SqlClient Data Provider Use your test server name or IP for
Server Name and under connect to database, select the database to use.
I’m using ssistest. Similarly, you need to create connection
manager configuration for Destination (i.e. MySQL) Right Click, under
Connection Manager’s tab and Select ADO.NET Connection as
mentioned above and click New on Connection Manager Editor. For
Provider click on the drop down arrow to select .Net Providers\odbc data
Provider. In the previous article we had created System DSN name MySQL.
We will use the same here for Use user or system data source name. Enter
your login information and Test Connection. It should succeed, and then
click Ok. Now, on Configure ADO.NET Connection Manager screen, you can
see both the source and destination are configured. It’s now time
to add control flows to the package, ensure you are on Control Flow tab
and Drag Data Flow Task from the Control Flow Items under ToolBox. Now
click on the Data Flow tab above and Drag ADO NET Source and ADO NET
Destination as shown below. You can rename the Source and Destination
Names from Properties. Right click on ADO NET SOURCE i.e. MS SQL Server
2008 and click EDIT. Make the below changes as mentioned on the screen.
As mentioned before I have ssistest database which has few sample
tables, that I have exported from AdventureWorks database. I’m
selecting one of the tables named HumanResources.Department for this
example. In the above screen you can see the test server selected with
database and table to transfer the data from. Similarly right click on
MySQL (Destination) and click Edit. On this screen, under ADO.NET
Connection Manager select MySQL.root from the drop downlist. Since we
do not have any destination table to map with the source, we need to
click on New. It will show you the below message, click OK Now remove
the quotes and Click OK to create a table on destination as shown. It
will come back to the editor, Click on the Mappings tab and check if
all the columns are mapped and click OK. This will complete the data
flow design of our package. Saving the SSIS Package
___________________________________________________ Once the package is
created, save the copy of it to SSIS Package store or MSDB. Follow the
below steps, Click on File> Save Copy of Package.dtsx As from the
menu it will open a window, Select SSIS Package Store as the Package
Location and Select the name of your test server. Enter /File
System/Export2MySQL as the package path and Click OK. Select File >
Save Copy of Package.dtsx as again from the menu and Select SQL Server
as the Package Location. Select the name of your test server and fill in
your authentication information. Enter Export2MySQL as the package path
and Click OK. Once the copies are saved you can see them under
Integration Services Stored Packages as shown below. Changes on MySQL
___________________________________________________ Before running the
Package, make this change on the MySQL Server mysql> SET GLOBAL
sql_mode= ‘ANSI’ Executing the Package
___________________________________________________ You can execute the
package either from SSMS and BI Development Studio. I’m doing it
from the BI, on the right hand side your BI screen you will find
solution explorer, which has the Package.dtsx listed. Right click and
select Execute Package. The moment you click execute the debugging
process start and the Output is show below on the BI screen. If
the Data Flow Task is changed to Green, it means the package has
succeeded, and if it is Red it means somewhere in the flow there was an
error. On the Data Flow Tab, you can see the number of rows transferred
in the execution. There is a progress tab on the screen, where you can
check the steps taken in the flow, it also list errors and warnings if
any. Query the MySQL Server to see if the data transfer was successful
as show below. This was about transferring data from MS SQL Server 2008
to MySQL, you can do vice verse by changing the source and destination
while designing the package on Data Flow tab. Hope this article is useful.
dbperf.wordpress.com |
7/23/10 1:34 PM
Install, configure and run MySQL Cluster – demo video There is a new video available: Demonstration of installing, configuring
and running MySQL Cluster (LINUX) to accompany the MySQL Cluster Quick
Start guides. The Flash video video lasts for about 7 minutes. If
you aren’t able to view Flash on your device then a (poorer
quality) version is included here – watch the Flash version if
you’re able to!
www.clusterdb.com |
7/23/10 11:27 AM
VistA scenarios, and other controversies at the Open Source health care track The history and accomplishments attributed to VistA, the Veterans
Administration's core administrative software, mark it as one of the
most impressive software projects in history. Still, lots of smart
people in the health care field deprecate VistA and cast doubt that it
could ever be widely adopted.
feedproxy.google.com |
7/23/10 7:21 AM
How to detect the user’s preferred language – smarter than Google Sometimes I find Google annoying. Don’t get me wrong, I am not one
of the many Google haters out there. They have done amazing things and
especially since they have released WebM as Open Source and royalty free
video format, they have earned a bonus in my attitude towards them. And
even though Google Chrome is not my personal favorite, I have to admit
that Chrome has brought a great momentum into the development of web
browsers (and therefor the WWW as a whole) which users of any browser
benefit from. Last but not least, I use a lot of Google’s services
and they do their job quite well. So all in all, Google’s record
from my point of view is pretty good. But one thing really sucks. I use
an English operating system. I use an English Firefox (locale en-US).
All the language preferences in the Google profiles of my accounts are
set to English. Nevertheless, I often get pages from Google delivered in
German by default. How they do that is quite obvious, or lets say, there
is only one explanation how they may do this: they identify my home
country based on my IP address. There are services that provide the data
and they are easy to implement, sometimes even for free, like
MaxMind‘s GeoLite Country database. Also I am using these services
and I believe there is nothing wrong in doing so, except for a few
things – and one of them is to identify the assumed web page
visitor’s language preferences based on his or her home country.
Thankfully, I am only a lightweight victim of this practice, because I
speak the language which I get served based on their guesses. But how
happy would a Japanese tourist on vacation in Austria be to get his
pages delivered in German, only because he uses an Austrian IP address?
There are many reasons why a person located in one country may not
understand the language that is spoken there. I only have to drive about
20 kilometers to get into such a country (the Czech Republic). So to
assume that a person’s preferred language is the language spoken
by the majority of a country (what about minorities or countries that
have more than one language?) is very wrong. What are better solutions?
As I mentioned, I use an English operating system and an English
browser. Almost every browser sends a user agent string. My currently
most frequently sent user agent string for example is “Mozilla/5.0
(X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.7) Gecko/20100713
Firefox/3.6.7″. Every web server that receives a request from a
user agent (usually a browser) can read this user agent string and
process it accordingly. What can we read in mine, amongst other
information? en-US. So the locale tells a web server that it’s a
browser (a Firefox 3.6.7) in American English. Now guess what my
preferred language will be? Is it likely that I use a browser in a
language which I don’t want? Is it likely that a tourist in a
different country uses a browser in a language that he or she
doesn’t want? It’s certainly possible (like if there is a
terminal in the hotel room or an Internet Café), but nowadays
most people have their laptops, notebooks, smart phones etc. And their
browser language is still their preferred one, regardless where they are
currently located. So the browser language serves as a much better
criterion to assume a web site visitor’s preferred language.
Alright, no big rocket science so far, and to be honest, there is no big
rocket science to come. At one point, when I was annoyed about
Google’s failure to identify correctly which language to deliver
me, I wondered, how hard would it be to implement the (in my humble
opinion) better solution, using the web browser’s locale. The
answer is far from being rocket science. The solution can be as easy as
this: First, get yourself a recent copy of the Zend Framework and
install it so you make its libraries accessible to your code. You can
add an __autoload() function, something like this: function
__autoload($classname) { if (substr($classname, 0, 4) ==
"Zend") { $classname = str_replace("_",
"/", $classname); require_once $classname . '.php';
} } or even easier, use the Zend Framework AutoLoader: require_once
'Zend/Loader/Autoloader.php'; Zend_Loader_Autoloader::getInstance();
Everything we need to identify the browser language and locale can be
found inside Zend_Locale. The constructor of the Zend_Locale class
allows to set the desired language or locale. In addition to that, there
are three predefined constants named Zend_Locale::BROWSER,
Zend_Locale::ENVIRONMENT and Zend_Locale::FRAMEWORK. According to the
Zend_Locale introduction manual page they do the following:
Zend_Locale::FRAMEWORK – “When Zend Framework has a
standardized way of specifying component defaults (planned, but not yet
available), then using this constant during instantiation will give
preference to choosing a locale based on these defaults. If no matching
locale can be found, then preference is given to ENVIRONMENT and lastly
BROWSER.” Zend_Locale::ENVIRONMENT – “PHP publishes
the host server’s locale via the PHP internal function
setlocale(). If no matching locale can be found, then preference is
given to FRAMEWORK and lastly BROWSER.” Zend_Locale::BROWSER
– “The user’s Web browser provides information with
each request, which is published by PHP in the global variable
HTTP_ACCEPT_LANGUAGE. if no matching locale can be found, then
preference is given to ENVIRONMENT and lastly FRAMEWORK.” And this
is what we want! Lets get to the code: $zend_locale = new
Zend_Locale(Zend_Locale::BROWSER); // returns en for English, de for
German etc. $browser_language = $zend_locale->getLanguage(); //
returns en-US for American English, en-GB for British English etc.
$browser_locale = $zend_locale->toString(); Three lines of code to
get all the information we need. And how to make a choice which language
to serve based on that? Lets assume we have a site in English, French
and German. French browsers should get French, German browsers should
get German and everybody else should get English: $site_language =
"en"; switch ($browser_language) { case "de" :
$site_language = "de"; break; case "fr"
: $site_language = "fr"; } And that’s it.
That’s all Google would have to do to make me happy. I assume,
their IP address based identification code requires more lines, and is
still a poorer solution in my opinion. There are additional benefits to
this solution. Want to display date and time according to the locale?
Here is what to do: $zend_date = Zend_Date::now(); $date_time =
$zend_date->get(Zend_Date::DATETIME_FULL, $zend_locale); print
$date_time; My default browser now shows me “Friday, July 23,
2010 2:14:58 AM Europe/Vienna”. If I fire up a German browser, the
same page delivers me “Freitag, 23. Juli 2010 02:17:11
Europe/Vienna” and there may be answers like “vendredi 23
juillet 2010 02:17:56 Europe/Vienna” (French), “viernes 23
de julio de 2010 02:18:29 Europe/Vienna” (Spanish),
“pátek, 23. července 2010 02:18:53
Europe/Vienna” (Czech) or
“2010年7月23日金曜日2時19分17秒
Europe/Vienna” (Japanese) as well. It’s pretty cool how the
Zend Framework can make such tasks very simple . Where is a solution
like mine being used in the wild? I don’t know where as of the
time of this writing, but I know where very soon. The MySQL website will
soon launch new localized content, delivered to people who we think
really want to see these languages. We try very hard not to annoy people
by enforcing a language upon them which they would not like to choose.
And remember: to make good guesses is good. The best solution however is
to provide choice! If you make assumptions and even if the assumptions
are well-founded, always leave people the choice to select what they
want. There may be situations which you are not thinking of and because
of some unusual circumstances, somebody may still want to choose
differently than you would think.
www.mpopp.net |
7/23/10 1:21 AM
|