The Pith of Performance: 2012

Wednesday, December 26, 2012

Season's Greetings 2012

Best wishes to all during this 2012 season and thank you for your patronage. Looking forward to doing more of the same in the new year.

Image made with Mathematica 7.0 for Mac OS X x86 (64-bit).

Tuesday, December 18, 2012

As already described previously, the main purpose of Release 6.0.1 Build 121512 is improved compatibility and stability between PDQ and the R statistical environment. For example, many of the PDQ models, previously found in the ../examples/ directory, can now also be accessed via the demo() command in the R-console. Testing was carried out using R version 2.15.2 (2012-10-26).

Operationally, PDQ, in any of the supported languages, should appear cosmetically the same as Release 5.0; no additional programming required. Since the PDQ-R source can be compiled separately, this release will be of special interest to Microsoft Windows users.

If you're new to PDQ, here's a simple PDQ-R model you can paste directly into the R-console:


library(pdq)
# input parameters
arrivalRate <- 0.75
serviceRate <- 1.0
## Build and solve the PDQ model
Init("Single queue model")         # initialize PDQ
CreateOpen("Work", arrivalRate)    # open workflow
CreateNode("Server", CEN, FCFS)    # single server
SetDemand("Server", "Work", 1/serviceRate) # service time
Solve(CANON)                       # solve the model
Report()                           # tabulated output

Please see the online release notes for the download links and more detailed information, as well as the top-level README file in the distribution. Beyond that, check out the relevant books and training classes.

Merry Xmas from the PDQ Dev Team!

Thursday, November 29, 2012

PDQ 6.0 from a Developer Standpoint

This is a guest post by Paul Puglia, who contibuted significant development effort for the PDQ 6.0 release; especially as it relates to interfacing with R. Here, Paul provides more details about the motivation described in my eariler announcement.

PDQ was designed and implemented around a couple of basic assumptions. First, the library would be a C-language API running on some variant of the Unix operating system where we could reasonably assume that we'd be able to link it against a standard C library. Second, programs built using this API would be "stand-alone" executables in the sense that they'd run in their own, dedicated memory address spaces, could route its I/O through the standard streams (stdout or stderr), and had complete control over how error conditions would be handled.

Not surprisingly, the above assumptions drove a set of implementation decisions for the library, namely:

All I/O would be pushed through the standard stream library functions like printf and fprintf
Memory for internal data structures would be allocated and released through calls to the standard library functions calloc and free
Error conditions would result in PDQ causing the model execution to stop with an explicit call to exit().

These aren’t usual decisions for a C API.

With the arrival of PDQ 2.0, we introduced foreign interfaces programming environments (PERL, Python and R) that allowed PDQ to be called from these other environments. All these new foreign interfaces were built and released using the SWIG interface building tool, which allows us to build these interfaces with absolutely no modification to the underlying PDQ code—a major benefit when you’ve got a mature, debugged API that you really want to remain that way. For the most part this arrangement worked pretty well—at least for those environments where it was natural to write and execute PDQ models like standalone C-programs (you can also read this as PERL and Python).

When it came to R, however, our early implementation decisions weren’t such a great fit for how R is commonly used, which is as an interactive environment, similar to programs like Mathematica, Maple, and Matlab. Like these other environments, R users do most of their interaction with a REPL (Read-Execute-Print Loop) usually wrapped in either full-fledged GUI interface or a terminal-like interface called the console.

It turns out that most of PDQ's implementation decisions could (and do) interfere with using R interactively. In particular:

Calling the exit() function results in the entire R environment exiting – not a good feature for an interactive environment.
Writing directly to the stdout and strerr using fprintf, bypasses R's own internal I/O mechanisms and prevents internal /O functions (like the sink() command) from working properly.
Using calloc() and free() functions interfere with R's own internal memory management mechanisms and would prove to be a major impediment for any Windows version of the interface.

Not only do these severely degrade the interactive experience for R users, their use also gets flagged by R’s extension building mechanism when it does a consistency check. And not passing that check would prove a major impediment for getting the PDQ's R interface accepted on CRAN (Comprehensive R Archive Network).

Luckily, none of the fixes for these issues are particularly hard to implement. Most are either fairly simple substitutions of the R API calls for C library routines or/and localized changes to PDQ library. And, while all of this does potentially create a risk of introducing bugs in the PDQ library, the reward for taking that risk is a stable R interface that can be eventually be submitted to CRAN. A version of the PDQ library can be easily built under Windows™ using the Rtools utilities.

Monday, November 12, 2012

PDQ 6.0 is On Its Way

PDQ (Pretty Damn Quick) version 6.0.β is in the QA pipeline. Although this is a major release, cosmetically, things won't look any different when it comes to writing PDQ models. All the big changes have taken place under the hood in order to make PDQ more consistent with the R statistical environment.


R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-apple-darwin9.8.0/i386 (32-bit)

> library(pdq)
> source("/Users/njg/PDQ/Test Suites/R-Test/mm1.r")
                ***************************************
                ****** Pretty Damn Quick REPORT *******
                ***************************************
                ***  of : Thu Nov  8 17:42:48 2012  ***
                ***  for: M/M/1 Test                ***
                ***  Ver: PDQ Analyzer 6.0b 041112  ***
                ***************************************
                ***************************************
...

The main trick is that the Perl and Python versions of PDQ will remain entirely unchanged while at the same time invisibly incorporating significant changes to accommodate R.

Guerrilla Training in 2013

The preliminary Guerrilla CaP training schedule for 2013 has been posted.

Book early, book often.

Tuesday, November 6, 2012

Hotsos 2013: Superlinear Scalability

As readers of this blog know, the Universal Scalability Law (USL) is a framework for quantifying performance measurements and extrapolating load-test data. Applied as a statistical regression model, the two USL contention (α) and coherency (β) parameters numerically indicate the degree of sublinear scalability in the data, i.e., how much linear scaling you're losing due to sharing and consistency overheads. Some examples of USL scalability analysis applied to databases, include:

VoltDB
Postgres
My USL analysis of the Postgres data

More recently, it was brought to my attention that the USL fails when it comes to modeling superlinear performance (e.g., see this Comments section). Superlinear scalability means you get more throughput than the available capacity would be expected to support. It's even discussed on the Wikipedia (so it must be true, right?). Nice stuff, if you can get it. But it also smacks of an effect like perpetual motion.

Every so often, you see a news report about someone discovering (again) how to beat the law of conservation of energy. They will swear up and down that it works and it will be accompanied by a contraption that proves it works. Seeing is believing, after all. The hard part is not whether to believe their claim, it's debugging their contraption to find the mistake that has led them to the wrong conclusion.

Similarly with superlinearity. Some data are just plain spurious. In other cases, however, certain superlinear measurements do appear to be correct, in that they are repeatable and not easily explained away. In that case, it was assumed that the USL needed to be corrected to accommodate superlinearity by introducing a third modeling parameter. This is bad news for many reasons, but primarily because it would weaken the universality of the universal scalability law.

To my great surprise, however, I eventually discovered that the USL can accommodate superlinear data without any modification to the equation. As an unexpected benefit, the USL also warns you that you're modeling an unphysical effect: like a perpetual-motion detector. A corollary of this new analysis is the existence of a payback penalty for incurring superlinear scalability. You can think of this as a mathematical statement of the old adage: If it looks too good to be true, it probably is.

I'll demonstrate this remarkable result with examples in my Hotsos presentation.

Thursday, October 11, 2012

Guerrilla Stickers Stick It to the London Tube

It looks like the guerrilla subversive spirit of GCaP tactics has reached across The Pond to the London Underground in the form of patron-placed alternative signage.

The small irony here is that I refer to the London Tube map in my GCaP classes as a paradigm for performance models:

2.4 More Like The Map Than The Metro

When I was writing the GCaP book, I asked the London tube authority for permission to use their classically dense map. In incredible bout of British bureaucratic officiousness, they offered the tube map at £100 per impression—an offer my publisher was only too keen to refuse. Hence, the map you see on p. 9 of the GCaP book is that of BART, an authority who were pleased to see it further advertised by a taxpayer, as long as it was sourced—an offer my publisher was only too happy abide by.

Guerrilla Class of October 2012

This is what you missed last week: another fine Guerrilla al fresco lunch at the BlueAgave restaurant. Clearly, capacity planning is a tough business⎯when you do it right.

G'rillas at the grill: Joshua, NJG, Tim, Kevin, Tad, Manju

Here's some of what went down:

Little's Law and IO Performance

Next Tuesday, August 7th, I'll be presenting at the Northern California CMG meeting^*. My talk will be about Little's law and its implications for storage IO performance.

As a performance analyst or capacity planner, you already know all about Little's law—it's elementary. Right? Therefore, you completely understand:

How Little's law relates inventory and manufacturing cycle time
John Little (now 84) is not a performance analyst
John Little did not invent Little's law
Little's law was known to A. K. Erlang more than 100 years ago
That there are actually ~~two~~ three versions of Little's law
Little's law is not based on queueing theory
Little's law expresses the fact that response time decreases with increasing throughput
However, on the SPEC website you'll see that response time increases with increasing throughput. WTF !!!?

If you're feeling slightly bewildered about all this, you really should come along to my talk (assuming you're in the area). Otherwise, you can read the slide deck embedded below.

3-dimensional view of Little's law

I'll show you how I discovered the resolution to the apparent contradiction between items 7 and 8 (above) by representing Little's law in 3-dimensions. It's very cool! Even John Little doesn't know about this.

Oh yeah, and I'll also explain how Little's law reveals why it's possible to make your application IOs go 10x to 100x faster. IOPS bandwidth has become irrelevant.

Some of these conclusions are based on recent work I've been doing for Fusion-io. You might've heard of their billion IOPS benchmark, and more recently by association with SSDAlloc software from Princeton University.

^* If you're not a ncCMG member, it's a one-time $25 entry fee, which then makes you a life member. See the bottom of their web page for payment and contact details.

Wednesday, July 4, 2012

Characterizing Performance Bottlenecks

If you do a Google search using keywords like: performance, bottleneck, analysis, you get quite a bewildering list of responses, and none of them seems to clearly define what they mean by the term bottleneck.^†

The word bottleneck refers to a choke point or narrowing, literally like the neck of a bottle, that causes the flow to take longer than it would otherwise. The effect on performance is commonly seen on the freeway in an area undergoing roadwork. Multiple lanes of traffic are forced to converge into a single lane and proceed past the roadwork in single file. Going from parallel traffic flow to serial flow means the same number of cars will take longer to get through that same section of road. As we all know, the delay at a freeway bottleneck can be very significant.

The same is true on a single-lane country road. If you come to a section where roadwork slows down every car, it takes longer to traverse that section of the road. Bottlenecks are synonymous with slow downs and delays, but they really determine a lot more than delay.

Is the Turing Test Tough Enough?

In the recent GDAT class, we covered machine learning (ML) applied to performance data analysis and particularly the application of so-called support vector machines. In that section of the course I have to first explain what the word "machine" means in the context of ML. These days the term machine refers to software algorithms, but it has its roots in the development of AI and the history of trying to build machines that can think. That notion of intelligent machines goes back more than sixty years to Alan Turing, who was born a hundred years ago today.

The Turing Test (TT) was introduced as "the imitation game" in Computing Machinery and Intelligence, Mind, Vol. 59, No. 236, pp. 433-460 (1950):

The new form of the problem can be described in terms of a game which we call the "imitation game." It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either "X is A and Y is B" or "X is B and Y is A." The interrogator is allowed to put questions to A and B.

Queues in the News

You might not have noticed, but queues have been in the news a lot lately. Not from the standpoint of computer performance but people performance or more accurately, crowd control. Most recently, queues popped up in the context of long delays expected at Heathrow airport due to big crowd arrivals for the London Olympics.

Fears over Heathrow queues during Olympics
The Waiting Game by The Numbers Guy
QANTAS to Get No Queues

In a queue, at least everyone is pointing in the same logical direction. Moreover, if you snake the queue, as they do at Disneyland, people always feel close to the destination and can see it getting even closer as customers ahead of them are processed. That helps to minimize their level of frustration: the control part. For some reason, the post office hasn't figured this out yet.

And, last but not least, queues and computer performance still remain an inevitable perennial. Most recently having to do with the Internet.

Load Testing with Uniform vs. Exponential Arrivals

In a couple of recent blog posts about generating exponential loads and why that is important for load testing and performance testing, it was not completely clear to some readers what was motivating my remarks. In this post, I will try to provide a more visual elaboration of that aspect.

My fundamental point is this. When it comes to load testing^*, presumably the idea is to exercise the system under test (SUT). Otherwise, why are you doing it? Part of exercising the SUT is to produce significant fluctuations in the number of requests residing in application buffers. Those fluctuations can be induced by the pattern of arriving requests issued by the client-side driver (DVR): usually implemented as a pile of PCs or blades.

PostgreSQL Scalability Analysis Deconstructed

In 2010, I presented my universal scalability law (USL) at the SURGE conference. I came away with the impression that nobody really understood what I was talking about (quantifying scalability) or, maybe DevOps types thought it was all too hard (math). Since then, however, I've come to find out that people like Baron Schwartz did get it and have since applied the USL to database scalability analysis. Apparently, things have continued to propagate to the point where others have heard about the USL from Baron and are now using it too.

Robert Haas is one of those people and he has applied the USL to Postgres scalability analysis. This is all good news. However, there are plenty of traps for new players and Robert has walked in several of them to the point where, by his own admission, he became confused about what conclusions could be drawn from his USL results. In fact, he analyzed three cases:

PostgreSQL 9.1
PostgreSQL 9.2 with fast locking
PostgreSQL 9.2 current release

I know nothing about Postgres but thankfully, Robert tabulated on his blog the performance data he used and that allows me to deconstruct what he did with the USL. Here, I am only going to review the first of these cases: PostgreSQL 9.1 scalability. I intend to return to the claimed superlinear effects in another blog post.

Sex, Lies and Log Plots

From time to time, at the Hotsos conferences on Oracle performance, I've heard the phrase, "battle against any guess" (BAAG) used in presentations. It captures a good idea: eliminate guesswork from your decision making process. Although that's certainly a laudable goal, life is sometimes not so simple; particularly when it comes to performance analysis. Sometimes, you really can't seem to determine unequivocally what is going on. Inevitably, you are left with nothing but making a guess—preferably an educated guess, not a random guess (the type BAAG wants to eliminate). As I say in one of my Guerrilla mantras: even wrong expectations (or a guess) are better than no expectations. In more scientific terms, such an educated guess is called a hypothesis and it's a major way of making scientific progress.

Of course, it doesn't stop there. The most important part of making an educated guess is testing its validity. That's called hypothesis testing, in scientific circles. To paraphrase the well-known Russian proverb, in contradistinction to BAAG: Guess, but justify^*. Because all hypothesis testing is a difficult process, it can easily get subverted into reaching the wrong conclusion. Therefore, it is extremely important not to set booby traps inadvertently along the way. One of the most common visual booby trap arises from the inappropriate use of logarithmically-scaled axes (hereafter, log axes) when plotting data.

Linear scale:: Each major interval has a common difference $(d)$, e.g., $200, 400, 600, 800, 1000$ if $d=200$:
Log scale:: Each major interval has a common multiple or base $(b)$, e.g., $0.1, 1, 10, 100, 1000$ if $b=10$:

The general property of a log axis is to stretch out the low end of the axis and compress the high end. Notice the unequal minor interval spacings. Hence, using a log scaled axis (either $x$ or $y$) is equivalent to applying a nonlinear transformation to the data. In other words, you should be aware that introducing a log axis will distort the visual representation of the data^†, which can lead to entirely wrong conclusions.

How to Generate Exponential Delays

This question arose while addressing Comments on a previous blog post about exponentially distributed delays. One of my ongoing complaints is that many, if not most, popular load-test generation tools do not provide exponential variates as part of a library of time delays or think-time distributions. Not only is this situation bizarre, given that all load tests are actually performance models (and who doesn't love an exponential distribution in their performance models?), but without the exponential distribution you are less likely to observe such things as buffer overflow conditons due to larger than normal (or uniform) queueing fluctuations. Exponential delays are both simple and useful for that purpose, but we are often left to roll our own code and then debug it.

Plan for Capacity Planning Training in May

Bookings are open for both Guerrilla Boot Camp (GBoot) and Guerrilla Capacity Planning (GCaP) classes in May 2012 at the Early Bird rate.

Entrance Larkspur Landing hotel Pleasanton California

As usual, classes will be held at our lovely Larkspur Landing location. Click on the image for booking information.

Before registering, take a look at some highlights students contributed from previous Guerrilla classes:

You too can be part of that educational experience.

Attendees should bring their laptops, as course materials are provided on CD or flash drive. The venue also offers free wi-fi to the internet.

Wednesday, March 7, 2012

The SSD World Will End in 2024

So says the Non-Volatile Systems Lab at UC San Diego. The claim is, in order to achieve higher densities, flash manufacturers must sacrifice both read and write latencies. I haven't had time to explore this claim in any detail, but I thought it might be useful for you to know about it. Some highlights include:

They tested 45 different NAND flash chips from six vendors that ranged in size from 72 nm circuitry to the current 25nm technology.
They then took their test results and extrapolated them to the year 2024, when NAND flash development road maps show flash circuitry is expected to be only 6.5 nm in size. At that point, read/write latency is expected to increase by a factor of two or more.
They did not use specialized NAND flash controllers such as those used by Intel, OCZ or Fusion-io. Their results can be viewed as "optimistic" because they didn't include latency added through error correction or garbage collection algorithms.
Considering the diminishing returns on performance versus capacity, Grupp said, "it's not going to be viable to go past 6.5 nm ... 2024 is the end."

The technical paper entitled, The Bleak Future of NAND Flash Memory (PDF), was presented and published at the FAST'12 conference held in San Jose, CA on February 14—17, 2012.

Related post: Green Disk Sizing

Friday, February 24, 2012

On the Accuracy of Exponentials and Expositions

The following is a slightly edited version of my response to a Discussion on the Linkedin CPPE group, which is accessible to Members Only. It's written in the style of a journal reviewer. The original Discussion topic was based on a link to a blog-post. I've been asked to make my Linkedin review more widely available so, here tiz...

The blog-post Capacity Planning on a Cocktail Napkin is a really good example of a really bad explanation. There are so many things that are misleading, at best, and flat-out wrong, at worst, it's hard to know where to begin (or where to stop). Nevertheless, I'll try to keep it brief [I failed in that endeavor. — njg].
The author applies the equation:
\begin{equation} E = λ \end{equation}
Why? What is that equation? We don't know because the author does not say yet what all the symbols mean. It's all very well to impress people by slinging equations around, but it's more important to say what the equations actually mean. After all, the author might have chosen the wrong one.

Hotsos Symposium 2012

Time Bandits: How to Analyze Fractal Query Times

Tues, March 6, 2012 @ 2:15 pm

That's the title of my presentation at this year's Hotsos Symposium and no, I won't be trying to make any obscure connections between Terry Gilliam's famous movie and Oracle database products (as interesting as that exercise might be).

Instead, I'll be talking about fractals in time and how they can impact performance—especially Oracle database performance. The responsiveness of your Oracle application can be lost for longer than expected periods of time, ostensibly stolen by time bandits.

Preview Slides (2012). A more detailed explanation of the fractal technique used is now provided in the Guerrilla Data Analytics (GDAT) class: How to Get Beyond Monitoring from Linear Regression to Machine Learning.

Friday, February 3, 2012

Green Disk Sizing

I finally got around to completing item 5 on my 2011 list concerning electrical power consumed by a magnetic hard disk drive (HDD). The semi-empirical statement is:

Power ∝ N_platters × Ω^2.8 × D^4.6 . . . (1)

where N_platters is the number of platters on the spindle, Ω is the rotational speed in revolutions per minute (RPM) and D the platter diameter in inches. The power consumed is then measured in Watts.

In principle, this makes (1) valuable for doing green HDD storage capacity planning. The bad news is, it is not in the form of an equation but a statement of proportionality, so it can't be used to calculate anything as it stands. More on that shortly. The good news is that all of the quantities in (1) can be read off from the data sheet of the respective disk vendor^†. Note that the disk capacity, e.g., GB (the usual capacity planning metric) does not appear in (1).

The outstanding question is: where do those funny non-integral exponents come from?

Throughput-Delay Curves

A colleague of mine at Yahoo.com asked me if I'd ever seen curves like this:

Not only is the answer, yes (it's a throughput-delay plot or XR plot in my notation), but that particular plot comes from my GCaP course notes. There, I use it to analyze the comparative performance of a functional multiprocessor (NS6000) and a symmetric multiprocessor (SC2000). Note how the two curves cross at around 1500 OPS. You can ask yourself why and if you can't come up with an explanation, you should be registering for a Guerrilla class. :)

The above XR plot also serves as a useful reminder that the throughput and response-time metrics are not only dependent on one another, but they are generally dependent in a nonlinear way—despite what some experts may claim:

My Year in Review 2011

Some days I wonder if I ever actually accomplish anything anymore. Maybe it's time to just pack it in and become a greeter at Walmart. I know a bit about how queues work, so that should put me a few notches ahead of the competition. And I would expect the competition to be fierce because it's a pretty cushy job; but not every day, apparently.

Before taking the big leap, I decided it might be a good idea to note down some of the technical projects I've worked on this year (over and above the daily grind):

Wednesday, December 26, 2012

Tuesday, December 18, 2012

Thursday, November 29, 2012

Monday, November 12, 2012

Friday, November 9, 2012

Tuesday, November 6, 2012

Thursday, October 11, 2012

Sunday, October 7, 2012

Wednesday, August 1, 2012

Wednesday, July 4, 2012

Saturday, June 23, 2012

Wednesday, May 23, 2012

Monday, May 14, 2012

Wednesday, April 11, 2012

Sunday, April 1, 2012

Wednesday, March 21, 2012

Sunday, March 11, 2012

Wednesday, March 7, 2012

Friday, February 24, 2012

Thursday, February 9, 2012

Friday, February 3, 2012

Sunday, January 22, 2012

Sunday, January 1, 2012