Sunday, April 27, 2008

Don Knuth on Open Source, Multicores and Literate Programming

Donald Knuth, the father of TeX and author of the long unfinished multi-volume set of books entitled "The Art of Computer Programming", sounds off in this interesting interview.

His comment on unit testing (or an overzealous reliance on it) seems a bit obscured here. I am certainly more inclined to concur with his other quote: "early optimization is the root of all evil," because unit testing tends to promote that bad strategy with respect to system performance. Some personal war stories describing exactly that will be delivered this week in my Guerrilla Boot Camp class.

However, do read and heed what he says about multicores and multithreaded programming. It should sound familiar. BTW, where he apparently says "Titanium", he means Itanium.

His riff on "literate programming" is a bit of a yawn for me because we had that capability at Xerox PARC, 25 years ago. Effectively, you wrote computer code (Mesa/Cedar in that case) using the same WYSIWYG editor that you used for writing standard, fully-formatted documentation. The compiler didn't care. This encouraged very readable commentary within programs. In fact, you often had to look twice to decide if you were looking at a static document or dynamic program source. As I understand it, Knuth's worthy objective is to have this capability available for other languages. The best example that I am aware of today, that comes closest to what we had at Xerox, is the Mathematica notebook. You can also use Mathematica Player to view any example Mathematica notebooks without purchasing the Mathematica product.

Wednesday, April 23, 2008

Postscript to "When is 325,000 Greater Than 325,425?"

When I originally blogged this item, I had an idea about an alternative interpretation but I couldn't quite express it. Now, I think I can.

When we see a high-precision number, we perceive it as being associated with something particular (e.g., this pig). When we see a rounded number, even if it is smaller in magnitude, we perceive it as being associated with an interval or collection of numbers (e.g., the collection of pigs in the drawing). That's more or less what's going on between Sylvie and Bruno; she's thinking round numbers about the collection of pigs, whereas Bruno is focused on the particular four pigs that he can see immediately.

An interval is a range of numbers that could encompass the precise number and therefore, by definition, exceed it. So, for example, $325,425 is larger than $325,000 in absolute magnitude, but the latter could be interpreted (i.e., perceived in our mind) as representing the interval $325,000 to $326,000, which encompasses $325,425. Hence, the particular may be perceived to be smaller than the general. The Cornell researchers did not test for this effect.

Sunday, April 20, 2008

When is 325,000 Greater Than 325,425?

According to a recent Cornell business school report, the answer is, when it involves money. The statistical study actually involved housing prices (rather apropos, given the meltdown in the mortgage market), so that a house priced with a "precise" number like $325,425 (i.e., more significant digits) was perceived to be cheaper than one priced at the rounded down value of $325,000.

Interestingly enough, prices like $299,999 (akin to the usual sales ploy one sees in department stores and on TV) were deliberately excluded from the Cornell study because numbers ending in all 9's, although a precise number (by their definition), is considered too close to the rounded value ($300,000) to give a statistically reliable distinction.

One has to wonder, what are the implications for the way capacity planning reports are perceived? We'll discuss that point next week as part of the section on significant digits in the Guerrilla Boot Camp class.

Even if you think such statistical reports might be a bit suspect, you've gotta love their opening rubric from Lewis Carroll:

`I'm counting the Pigs in the field!' (Bruno, looking out the window)
`How many are there?' I enquired.
`About a thousand and four,' said Bruno.
`You mean "about a thousand",' Sylvie corrected him.
`There's no good saying "and four": you can't be sure about the four!'

`And you're as wrong as ever!' Bruno exclaimed triumphantly.
`It's just the four I can be sure about;
'cause they're here, grubbling under the window! It's the thousand I isn't pruffickly sure about!'

--- "Sylvie and Bruno Concluded" (1893), Ch. 5, p. 3.

Guerrilla Manual Updated

The section of the Guerrilla Manifesto that outlines my
Universal Scalability
law, has been updated with the following diagrams,

which show the explicit components of the model (equation 1). Such effects are now being recognized more widely, so I'll be explaining more about this in my Guerrilla Boot Camp class, next week.javascript:void(0)

Saturday, April 19, 2008

The Woolliness of the Wild Wild Web

WWW is the acronym for World Wide Web, but it more often seems to stand for the Wild and Woolly Web.

Call me old-fashioned, but one of things the drives me up the wall about publication on the web in general, and technical expositions in particular, is the lack of both time-stamps and citations. These two things have existed in the scientific media even before formal journal publication. For example, 17th century scientists like Newton and Hooke, wrote missives to each other and it was convention then, as it is today, to commence a letter with the date. That's how we know that Hooke was very close to coming up with the law of gravitation that is now attributed to Newton (also aided by the latter meticulously eliding all reference to Hooke after the first edition of The Principia). Could we know those things today if they had been using the Web? It's not clear. It depends. And that's the problem; lack of consistency and a lack web tools to enforce consistency.

Friday, April 11, 2008

Internet Needs Flow Control

In a post by Larry Roberts (co-founder of the Internet), he proposes a flow control solution to congestion control problems on the Internet. This is an important ongoing issue ever since the Internet collapsed circa 1986. Because it relates to queueing policies, I discuss this problem in Section 1.8.3 "Metastability on Networks" of my Perl::PDQ book.

Roberts claims, contrary to popular belief, the problem with congestion is the networks, not the TCP protocol. Rather than overhaul TCP, he says, we need to deploy flow management and selectively discard no more than one packet per TCP cycle. Flow management is the only alternative to probing into everyone's network and the only way to fairly distribute Internet capacity. The comments on his post are also worth reading because they compare his proposal with already defined protocols, such as WRED and DiffServ.

Podcast: "Diving into Capacity Planning"

A podcast that I did for TeamQuest Corporation, back in December, is now available. It's a somewhat unconventional take on the motivations for doing CaP, based on taking into account the apparently frustrating but otherwise very realistic perspective of management. During the podcast, I refer to the CMG Keynote given by Jerred Ruble (CEO of TeamQuest Corp.) Here is the abstract of his presentation entitled, "Is Capacity Planning Still Relevant?" (click to enlarge)

Simple registration required to download the 25 MB mp3 file. This podcast also gives you an idea of some the things we will be treating in the Guerrilla Boot Camp class on April 28-29, 2008.