Saturday, May 15, 2010

Emulating Web Traffic in Load Tests

One of the recurring questions in the GCaP class last week was: How can we make web-application load tests more representative of real Internet traffic? The sticking point is that conventional load-test simulators like LoadRunner, JMeter, and httperf, represent the load in terms of a finite number of virtual user (or vuser) scripts, whereas the Internet has an indeterminately large number of real users creating load.

From a queueing theory standpoint, we can view the question slightly differently. The test rig corresponds to a closed queueing system because no requests arrive from outside the system so, the sum of the queue lengths can never be larger than the number of vusers. On the other hand, a system driven by Internet traffic is referred to as an open queueing system in that the queues can be of an unbounded length (but not infinite). Moreover, we could put a probe at the entry point into a web server and measure the actual arrival rate in terms of connections per second. In any steady-state measurement period, the average arrival rate will be constant.

Since I don't happen to have the right kind of measurements from a real test rig, I'm going to use PDQ to demonstrate how the question can be addressed. I'm fully aware that there are other tools and techniques to emulate Internet traffic, but that was not the question that came up in class.

Standard Load Test (fixed Z)

Suppose that a load test typically starts with $N = 1500$ vusers, each having a mean think time $Z = 10$ seconds, and continues adding load up to some prescribed maximum, say 15,000 vusers. The results might look like this:


       N   Z  N/Z    X    Q/Z
    1500  10  150 149.47     0.53
    3000  10  300 199.87   100.13
    4500  10  450 199.96   250.04
    7500  10  750 199.99   550.01
    9000  10  900 199.99   700.01
   10500  10 1050 200.00   850.00
   12000  10 1200 200.00  1000.00
   13500  10 1350 200.00  1150.00
   15000  10 1500 200.00  1300.00

Note that the numbers in the $Z$ column all have the same value of 10 seconds. The throughput ($X$) increases up to the point where some resource saturates and becomes the bottleneck. We know from queueing theory that the bottleneck is the resource with the longest service time. Referring to the PDQ model listed at the end, the longest service time is $S_{max} = 0.005$ seconds, and it belongs to the queueing node labeled AppBox1. We also know from queueing theory that inverting Smax tells us the maximum possible system throughput. Since $1/S_{max} = 200$ connections per second, we expect to see that number as the saturation throughput and indeed, we do, at $N = 10,500$ vusers.

The last column in Table 1 just shows that the total number of requests (total queue length, $Q$) in the SUT increases with increasing N. Here, the $Q$ value is normalized by $Z$ for reasons that will become clear in the next two sections.

Internet Load Test (scaled Z)

Now, suppose we know that the mean Internet arrival rate = 150 connections per second at the production web site during some period of interest, e.g., the one-hour peak traffic at 11 am each day. How can we modify the test procedure in the previous section to emulate that rate on the same test rig?

It seems obvious that we need to increase N to some large number—much larger than the 15,000 vusers in Table 1. The trick is, when we scale $N$ up, we also need to increase the think time $Z$ in the same proportion, so that the ratio $N/Z$ remains constant. If we rerun the previous load test using this procedure, the results look like this:


       N    Z  N/Z    X  Q/Z
    1500   10  150 149.47 0.53
   15000  100  150 149.95 0.05
   30000  200  150 149.97 0.03
   45000  300  150 149.98 0.02
   75000  500  150 149.99 0.01
  150000 1000  150 149.99 0.01
  300000 2000  150 150.00 0.00

The first row is identical to the standard load test in Table 1. This time, however, the throughput approaches the value $X = 150$ connections per second, not the saturation value of 200 connections per second. Note also that the quantity $Q/Z$ now becomes vanishingly small. The queues are growing, as you would expect in an open queueing model, but not as fast as the scaled value of $Z$ is growing. Therefore, $Q/Z$ gets smaller.

I chose the initial values of N and Z to match the known Internet traffic arrival rate. You can do the same thing to emulate Internet loads. Just choose the ratio $N/Z$ to be equivalent to the Internet connections per second and then incrementally increase $N$ and $Z$ together in your client scripts until the measured throughput on the test rig matches that Internet arrival rate.

Why It Works

To understand why this principle works, you should review my previous blog post "Using Think Times to Determine Arrival Rates." There, I explain that the effective arrival rate in a closed queueing system (like the SUT in a test rig) is given by:

\begin{equation} \lambda_N = \dfrac{N − Q}{Z} \end{equation}

It says that if all the requests were in the SUT, then $Q = N$ and the effective arrival rate into the SUT during the next sample period would be zero. Conversely, if no requests have been issued to the SUT, then $Q = 0$ and the effective arrival rate is expected to be anything between 0 and $N$, in the next sample period. In other words, the effective arrival is variable in a closed queueing system, whereas we are trying to emulate an open queueing system that has a constant mean arrival rate, by definition.

Dividing through by Z, I can rewrite eqn.(1) as:

\begin{equation} \lambda_N = \dfrac{N}{Z} − \dfrac{Q}{Z} \end{equation}

The first term is the ratio $N/Z$ we discussed in the previous section, while the second term corresponds to the last column in Tables 1 and 2. As we make $Z$ larger (along with increasing $N$), that second term contributes less and less. Consequently, $\lambda_N$ gets closer and closer to $N/Z$, the desired constant arrival rate for an open system in steady state.

Whether or not you can actually muster the required massive number of vusers is another question whose solution will depend on other constraints such as: licensing costs, machine costs, and so on. Setting $Z = 0$ (or close to zero in this discussion) is one approach that is commonly employed as a workaround.

PDQ-R Model

Here is the PDQ code in R that I used to generate Tables 1 and 2. Refer to the online PDQ manual for details about the PDQ functions.


# Created by NJG on Sat May 15 14:46:13 2010
library(pdq)

workname  <- "httpGet"
servname1 <- "AppBox1"
servname2 <- "AppBox2"
servname3 <- "AppBox3"
servtime1 <- 0.005
servtime2 <- 0.004
servtime3 <- 0.003
usrinc    <- c(1,2,3,5,6,7,8,9,10)
sflinc    <- c(1,10,20,30,50,100,200)

cat(sprintf("%8s\t%4s\t%4s\t%4s\t%4s\n", 
   "N", "Z", "N/Z", "X", "Q/Z")) # Header

for(scalefactor in sflinc) {
   vusers    <- 1500 * scalefactor
   thinktime <- 10   * scalefactor
   
   Init("HTTPd Closed Model")
   CreateClosed(workname, TERM, vusers, thinktime)
   CreateNode(servname1,CEN,FCFS)
   CreateNode(servname2,CEN,FCFS)
   CreateNode(servname3,CEN,FCFS)
   
   SetDemand(servname1,workname,servtime1)
   SetDemand(servname2,workname,servtime2)   
   SetDemand(servname3,workname,servtime3) 
   
   Solve(APPROX)
   
   tp <- GetThruput(TERM,workname)
   q1 <- GetQueueLength(servname1,workname,TERM)
   q2 <- GetQueueLength(servname2,workname,TERM)
   q3 <- GetQueueLength(servname3,workname,TERM)

   cat(sprintf("%8.f\t%4.f\t%4d\t%6.2f\t%4.2f\n", 
     vusers, thinktime, vusers/thinktime, tp, 
     (q1+q2+q3)/thinktime))
}

No comments: