Skip to content
Jun 8 11

Evolving Flying Fish, Running Birds, and Swimming Lizards: Resolving Paradigm Shifts

by Benjamin L. Russell

Approximately two weeks ago, on Wednesday, May 25, 2011, I posted the following question on the Haskell Beginners Mailing List:

Subject: question for advice on packages for a retro-style pocket money ledger program

For some time, I have been considering rewriting my original pocket money ledger program in Haskell, but am not sure as to which packages would be most useful. My original pocket money ledger program was first written in circa 1983 in N80-BASIC (a JIS-compatible ROM-based line BASIC) on an NEC PC-8001 mkII [1].

The main difficulty is that I would like to preserve the original look and feel of my original program while still rewriting it in Haskell. However, my original program ran in an age before any kind of Macintosh-style or Windows-style GUI had become common, and used basic built-in ROM-based graphics and sound commands to draw a double-line colored graphical border on a centered-colored-text screen, and saved files to a floppy disk drive (floppy disks were actually considered advanced for that age, since most personal computers then used a cassette tape drive for external file storage).

As a result, I am uncertain as to which packages would be useful for preserving the original look and feel of my program. At minimum, I would like my program to exhibit the following behavior:

1. Start out by drawing a lime-green-colored double two-pixel wide box surrounding the screen, with the title and author name displayed as separately-colored text centered in the screen.

2. If possible, play a suitable music file while on this screen. Also, display an option to mute this music, and if this music is once muted, save this option for future runs of this program, so that the program will start out with the music muted as the default option. Also, display an option to control the volume of this music, and if possible, also display a separate option to choose a different tune to play. The tunes should be selectable from a selectable, expandable menu.

3. If possible, include a suitable screen saver to be displayed automatically, separate from the GUI screen saver, upon a user-defined period in which there has been no user input. The screen saver should be configurable in a separate configuration menu.

4. Upon the user hitting the Return key, stop playing the music (if playing), and move to the main menu screen.

5. The options displayed in the menu screen should include, at minimum, the following:

a. Create a new ledger book.

b. Load a ledger book (preferably from floppy disk).

c. Add an entry to the ledger book.

d. Delete an entry from the ledger book.

e. Erase the ledger book.

f. Save the current ledger book (preferably to floppy disk).

g. Tabulate and display income and expenditures (leads to a separate
screen requesting a time period for tabulation). (The income and
expenditures should be displayed in detail both as numerical charts
and as colored graphs in a preferred form (pie chart, bar graph,

h. Set configuration options (leads to a separate screen for
configuring default background music, default volume, default
application screen saver, default theme, default fonts, default menu
behavior, default keyboard remappings, default screen height/width
ratio, etc.).

i. Exit the application.

5. For each option (except for option i), move to a separate sub-menu screen. When input/output and processing have been completed on the separate sub-menu screen, return to the main menu screen.

Most of the functionality of this program is related to some form of input and output: colored formatted combined graphical and textual title screen with background music, manual data entry by a user, saving files to a floppy disk, reading files off a floppy disk, deleting files from a floppy disk, configuring various default music/volume/screen saver/theme/font/menu behavior/keyboard remapping/screen height and width ratio options, and combined numerical and graphical representation of statistical ledger information. Most of the computation involves only simple arithmetic (mainly addition and subtraction, with perhaps some multiplication and/or division, and no matrix manipulation).

The main point of this program is that program usage should be interactive, and not require a separately entered file. The idea is that the program will play the role of an interactive personal ledger assistant.

However, I am not sure as to how to implement this program in Haskell easily. Because most of the program is mainly concerned with side effects, it is difficult to write it easily while preserving referential transparency. This type of program seems relatively straightforward to write in N80-BASIC (which is no longer available in the original version), but is relatively less trivial in a purely functional programming language such as Haskell. However, I am tired of spaghetti code, and although writing each line of code in N80-BASIC is trivial, managing control flow is not. It is very difficult to manage control flow without writing spaghetti code in a dialect of line BASIC. However, most of the associated graphics and sound commands are proprietary and implementation-dependent, and I am not sure how to rewrite that part of the functionality in an implementation-independent language without spending lots of time on API-related issues.

– Benjamin L. Russell

[1] _OLD-COMPUTERS.COM Museum ~ NEC PC 8001 MK 2_. NYI (New York Internet). n.d. Web. May 25, 2011. .

Unfortunately, so far, nobody has replied, and I have a nagging suspicion that nobody will. The problem is that my question concerns translating a program originally written in N80-BASIC into Haskell, a language usually used for entirely different purposes. The original program used a number of language-specific, platform-dependent features (such as superimposing colored line graphics around the borders of a textual screen, and reading from and saving to a floppy disk).

For some languages, such features translate into corresponding features in the target language. However, Haskell is a purely functional programming language; as such, it is designed to eschew side effects, while my original program focused almost entirely on side effects. Therefore, translating my solution actually involves a paradigm shift.

This is not the first time that I have encountered a paradigm shift. In an earlier entry on this blog, entitled “Paradigm Shift: Back to the Past, and No Small Talk About Smalltalk,” dated August 25, 2009, I had to deal with another paradigm shift, that one involving a transition from the functional paradigm of Haskell to the message-passing paradigm of Smalltalk (a problem with which I am still struggling).

The problem is that most programmers who feel comfortable in one paradigm do not seem very eager to deal with paradigm shifts. For example, I have not read many writings by hardcore C++ programmers who enjoy translating C++ systems programming code into referentially transparent Haskell code. Neither have I read many papers by Prolog programmers about translating concurrent thread-manipulation Erlang code into pure Prolog code. Similarly, I have not read many papers by Smalltalk programmers about translating referentially transparent implementations of arrows in Haskell into Smalltalk code for which proofs of correctness can be written.

Most fish don’t seem to enjoy flying. Most birds don’t seem to enjoy running. Most lizards don’t seem to enjoy swimming.

However, exceptions exist: Flying fish fly, ostriches run, and marine iguanas swim. Evolution is possible.

The same can hold true, with varying degrees of adaptation, for programming languages. It is possible to push the proverbial cart from its side across town without ever using its wheels.

I, for one, think that there is a novel dimension of exhilaration associated conquering a paradigm shift which is of a different quality from that achieved by solving a programming problem. I have encountered a number of paradigm shifts in my life: imperative BASIC to pseudo-functional Scheme, pseudo-functional Scheme to functional Haskell, and functional Haskell to message-passing Smalltalk. Such shifts resemble the cultural shocks that I encountered when first moving from California to Tokyo, then from Tokyo to New Haven, and finally from New York back to Tokyo again.

To draw an analogy:

N80-BASIC:Haskell:Smalltalk::Tokyo:New Haven:Palo Alto

For some reason, I have never felt truly comfortable in any programming language. When programming in N80-BASIC, I struggled with spaghetti code. When programming in Scheme, I felt frustrated at not knowing how to draw colored superimposed lines around colored text on a screen specifically designed to allow superposition of graphics onto text. When programming in Haskell, I felt frustrated at not knowing how to write reflective programs that did not compute any value and that dealt only with side effects. When programming in Smalltalk, I felt frustrated at not knowing how to write referentially transparent functions for computing arrows.

What I would really wish for is a programming language that could do ALL OF THE ABOVE.

For the time being, I would be willing to settle for an easy way to translate my original N80-BASIC personal checkbook program into something that is algorithmic, referentially transparent, reflective, cross-platform portable, equipped with a rich set of libraries, and easily makes use of platform-specific features: a combination of Scheme, Haskell, Smalltalk, Clojure, and N80-BASIC.

Essentially, I need the artificial, programming language counterpart to a natural duck-billed platypus. Any ideas?

Jun 7 11

Paul Gauguin Couldn’t Paint, But He Still Became A Great Painter

by Benjamin L. Russell

Today, I discovered a blog post, “Soloists vs. The Choir,” by Andy Leonard on a blog entry by Joel Spolsky, founder of Fog Creek Software, on the correlation (or lack of, rather) between spent time and resulting quality of programming. Leonard wrote:

Is there really that great a difference between good and great programmers?

Joel cites some impressive statistics gleaned from Yale Professor Stanley Eisenstat, who teaches a software development class. The scatter plot says it all.

As Joel notes, “There’s just nothing to see here, and that’s the point. The quality of the work and the amount of time spent are simply uncorrelated.”

While that may be so, companies do not determine what people want to do; people do. This kind of reasoning leads to the conclusion that only star programmers should program. By that kind of reasoning, only great writers should write, only great translators should translate, and only great painters should paint. Anomalies such as Paul Gauguin, who became a famous painter despite lacking any aptitude for painting, are simply ignored.

The problem with this kind of reasoning is that it equates people with their skills, but completely ignores their interests. Not all people are exceptionally talented at what they are interested in, nor are all people necessarily interested in what they are talented in; some people are relatively talented in a subject that is not profitable to making into a living (such as painting or poetry).

However, by this kind of reasoning, what are painters and poets, for instance, supposed to do? Painting and poetry are not means that are profitable enough usually to earn a living; however, if people are only supposed to do something at which they are exceptionally talented, then people whose abilities lie in unprofitable industries such as painting or poetry should just starve to death, since only skills matter, and these skills are unprofitable.

This is the kind of reasoning that causes certain professors (Professor Stanley Eisenstat being one example with which I am personally familiar, since I took a class (CS 323a: “Introduction to Systems Programming,” fall 1993) under him) to try to “weed out” students who aren’t exceptionally gifted in programming. Granted, he wasn’t alone; Alan Perlis (also a Yale computer “science” professor) was reportedly much, much more severe, and reportedly once gave a first graduate programming assignment to solve five non-trivial programming assignments, including writing an artificial intelligence program to solve the “Eight Queens Puzzle,” in five different languages each, all in one week, only deliberately to announce that it was a joke when the assignment was due, and that one solution to one problem in one language was sufficient. On a scale of course difficulty level from 0 to 100, with higher numbers denoting greater difficulty, with Perlis at 100, I would probably place Eisenstat at about 15, especially for his class in fall 1993 (he once mentioned in his systems programming class that one of his later relatively difficult assignments, his infamous “encode-decode” assignment, used to be his second assignment). However, among the professors under whom I took courses in college in computer “science” (I put “science” in quotes because computer “science” is not really a science at all, but a procedural epistemology), Eisenstat ranked among the toughest taskmasters.

Well, should only the most gifted be allowed to pursue their interests? This solution only works if all people have at least one area in which they are especially gifted (people with well-rounded but average-level abilities may not), if all work areas are equally profitable, and if all people are interested in what they are gifted at. However, this is not true.

Let’s consider to where this style of reasoning leads. Assume that there exists a society, Utopia, where only people with exceptional skills are allowed to work in their areas of special ability. I.e., only great programmers are allowed to program, only great translators are allowed to translate, only great painters are allowed to paint, and so on. All others are strongly discouraged from working. What happens?

Well, most artifacts become works of art. Mostly great programs are written, mostly great translations are translated, mostly great paintings are painted, and so forth. I say “mostly,” not “only,” because in practice, even great workers occasionally produce poor work. Even a genius has an occasional dog day.

So far, so good, it seems. What else? Well, eventually all schoolchildren are classified, while still young, into classes corresponding to their abilities. Their interests are simply ignored.

Since interests no longer matter, anything unrelated to skills is also strongly discouraged. Anime is outlawed. Chocolate is outlawed. Games, except for certain puzzle and mathematical games, are outlawed. Movies are outlawed. None of these are essential to increasing productivity directly, are they? The big companies decide that we don’t need them, so they lobby Congress to eliminate them. Congress wants the funds from the lobbyists, so laws are passed to outlaw them. The Second Prohibition begins.

Hey, the more productivity, the better, and the more focus on that productivity, the better, right? The big companies are still not satisfied with productivity. They need more productivity, more money. “Let’s make the people concentrate more on their work,” they say. Poetry is outlawed. Painting is outlawed. Music is outlawed. Unproductive entertainment in general is outlawed. After all, if it isn’t profitable, it isn’t important, right? Productivity and profit are all that matter, right?


WRONG. Something is missing here. What is missing? The value of personal interests. People do not usually become interested in something only as a result of being skilled/gifted at the subject. People usually become interested in something because that something is *fun*. Why? The reason is that they simply like it. I.e., they are interested in it. These interests occasionally lead to works of genius from certain people, but those results usually originate in some form of initial interest. Without initial interest, works of genius do not usually arise.

The point is that personal interests matter. Preferences matter. Likes and dislikes matter. Allowing a gifted person to produce gifted work is one thing; preventing less gifted people from even trying is quite another. While I do believe that gifted people should be encouraged to develop their skills, I do *not* believe that those who are less gifted should be discouraged. Some people develop skills late in life. Others come up with ways around problems. If one cannot program well in C++ in a software company, one might be able to program in Haskell, Scheme, or Smalltalk alone on a project uniquely suited to the particular programming language in an environment where most of the software components have already been designed by other programmers and one is paying one’s own salary from another source of income.

It is one thing to tell someone, “You are a great C++ programmer! You are encouraged to use your skills!” It is quite another thing to tell someone, “You really suck [excuse my French] at C++ programming! You definitely should not program in any language anywhere!” Excuse me? Any language anywhere? What if the person wants to work alone on a personal project using a language with built-in libraries uniquely suited to the language, and has a separate source of income? What if the person can’t write a program worth a dime in C++, but programs relatively decently in, say, Scheme, Haskell, or Smalltalk? I once met a first-order logic student who hated mathematics, but loved logic. I later met a different person who felt comfortable at programming in C, but just couldn’t program in C++. What if the person doesn’t feel comfortable in programming in C++, but is a genius at, say, Common Lisp, and eventually sells their business worth millions of dollars to the Yahoo! company?

(This actually happened once; see “The Old Joel on Software Forum – Yahoo Stores rewritten from Lisp to C++ and Perl” [curiously, this link is posted on Joel's own site!]. According to the article “Exploring e-commerce for innovative products,” by Piotr Wozniak, the site sold for 45 million dollars.)

The idea that *the choice of the programming language matters* is not new. For example, Paul Graham deliberately chose Common Lisp for ViaWeb, the site he eventually sold to Yahoo! as “Yahoo Stores” for forty-five million dollars. In his essay “Beating the Averages,” he writes,

So you could say that using Lisp was an experiment. Our hypothesis was that if we wrote our software in Lisp, we’d be able to get features done faster than our competitors, and also to do things in our software that they couldn’t do. And because Lisp was so high-level, we wouldn’t need a big development team, so our costs would be lower. If this were so, we could offer a better product for less money, and still make a profit. We would end up getting all the users, and our competitors would get none, and eventually go out of business. That was what we hoped would happen, anyway.

What were the results of this experiment? Somewhat surprisingly, it worked. We eventually had many competitors, on the order of twenty to thirty of them, but none of their software could compete with ours. We had a wysiwyg online store builder that ran on the server and yet felt like a desktop application. Our competitors had cgi scripts. And we were always far ahead of them in features. Sometimes, in desperation, competitors would try to introduce features that we didn’t have. But with Lisp our development cycle was so fast that we could sometimes duplicate a new feature within a day or two of a competitor announcing it in a press release. By the time journalists covering the press release got round to calling us, we would have the new feature too.

It must have seemed to our competitors that we had some kind of secret weapon– that we were decoding their Enigma traffic or something. In fact we did have a secret weapon, but it was simpler than they realized. No one was leaking news of their features to us. We were just able to develop software faster than anyone thought possible.

Some might argue, “Well, Paul Graham was a genius, and what was important was that he just happened to be a genius, not that he chose such a language as Common Lisp. What is really important is the the programmer must be a star programmer, not that the language be similar to Lisp.”

However, the choice of the programming language can be a decisive factor in whether the person becomes interested in programming in the first place. In the excerpt “High School Computing: The Inside Story,” Natasha M. Chen writes,

In the four months it took me to complete my course in Scheme, I learned more about computer programming than I had in my two years of Pascal. In less than five minutes after I began reading the text, almost everything I learned more than three years previously in our aborted Logo course came back to me. Five minutes, not the two days it took to recover from just one summer away from Pascal. There were hardly any rules of syntax to remember. Furthermore, throughout the entire four months, I never touched a computer. The ease of learning and using Scheme gave me such confidence in the programs I wrote that I didn’t feel the need for the security of a compiler to check my work.

Let’s compute a little: It took Chen four months of studying Scheme (another dialect of Lisp) to learn more about computer programming than she had learned in two years of Pascal. Two years is twenty-four months, or six times four months, for six times the learning speed. So choosing Scheme over Pascal resulted in a six-fold learning speed increase.

Furthermore, the choice of Scheme over Pascal influenced her decision to return to learning programming. She writes,

After my sixth grade BASIC experience, I never wanted to take another computer course again. Of course, when you are eleven years old, G.P.A. and class rank don’t mean much to you. But by the time I was about to enter my junior year in high school, I started thinking about those things … and college … and the classes I needed to take. To avoid another BASIC nightmare, I decided to bypass Computer Programming I (BASIC) and go straight into Computer Programming II (Pascal). Pascal was different enough from BASIC to make me think that it had to be better. I found out that the improvement was far less than I had hoped. We jumped right into the syntax of Pascal: program (input, output), begin-end, etc. Even after two years of studying Pascal, I still can’t remember all the rules.”

She continues,

As a senior, I had a study hall period that I sometimes spent in my math classroom doing homework. It was on one of these days that I happened to overhear my math teachers talking about Scheme. I was already tearing my hair out in my Pascal class trying to learn something for the upcoming A.P. exam—in fact, all I was learning was the page number of the reference section in our textbook, which I frequently consulted to see whether type declarations or variable declarations came first or to re-check how to declare a record for a linked list. Enticed by what I heard, I willingly gave up my study hall to come in four days of every week to learn Scheme on my own for no credit at all, using `The Schemer’s Guide’ [1]. My reward was that I regained the enthusiasm and interest I thought I had lost six years earlier.

So Scheme enabled her to regain the “enthusiasm and interest” in programming that BASIC had almost caused her to lose six years earlier.

I myself had a similar experience, albeit with different programming languages. I first started out in programming approximately six years before matriculation, in N80-BASIC on an NEC PC-8001 mkII personal computer in circa 1983. In college, similarly to Chen, I also had the misfortune of learning Pascal in an introductory course intended for non-majors (I had not yet decided to major in computer “science”). One of the assignments required writing pointers in Pascal. That felt like doing gymnastics in a straitjacket. I had to spend so much time and effort focusing on syntax that I couldn’t concentrate on the algorithm.

Just to illustrate that different languages suit different programmers, I had a different experience than Graham with Common Lisp. My first course for majors in computer “science” in fall 1991 required programming in both Common Lisp and Scheme. The professor, Drew McDermott, actually gave us a handout entitled “Common Lisp for Schemers” outlining differences between the two languages. Unlike Graham, I felt uncomfortable in Common Lisp because using that language required looking up almost every function in a library reference book that was over a thousand pages long. I read slowly and have poor memory, so using that reference book so frequently forced me to worry more about syntax again than about the algorithm. However, Scheme was different: The entire R5RS specification fit in 50 pages, and there was no hefty library reference book. Programming in Scheme was fun. About three years later, when I audited a later version of the course under a different professor as a refresher, a TA actually commented that I wrote a better program for one assignment than he himself had.

The choice of the programming language does matter. In fact, it can make a crucial difference, not just in programming efficiency, but in basic motivation as well. People tend to do better at what they enjoy doing.

To sum: Yes, Professor Stanley Eisenstat, I do agree that there is a vast, insurmountable difference between good and great programmers. I also believe that there is a vast, insurmountable difference between using a programming language in which one feels comfortable and one in which one doesn’t, that professors have no business in telling students what they should be interested in, and that there is no correlation between interest level and ability level. Hey, Paul Gauguin couldn’t paint, but he still became famous as a great painter. So it is with great artists.

Apr 18 10

ANN: Twitter Account

by Benjamin L. Russell

As of Sunday, April 18, 2010, those of you who are interested may now read Benjamin L. Russell (DekuDekuplex) on Twitter. There, since each post is limited to 140 characters, I plan to jot down frequent notes on anything interesting I discover. Feel free to write to me there as well!

While setting up my account there, since I wasn’t satisfied with any of the default themes, I searched and managed to find some exquisite fantasy-theme backgrounds at For my current theme, I have chosen Blacklight Fantasy, which reminds me of fractals. What do you think of it?

Mar 15 10

ANN: Theme changed from Rubric to Titan.

by Benjamin L. Russell

This is an announcement that the theme for this blog has just been tentatively changed from the previous Rubric theme to a new theme, Titan, just announced on March 9, 2010.

Apparent advantages:

* The color scheme is less harsh on the eyes.

* The width is narrower, and hence easier to scan.

* An “About” page is now visible.

* A list of links is now visible.

* The overall look is cleaner.

Apparent disadvantages:

* The titles of recommended publications are no longer visible.

* Long lines must now be scrolled horizontally.

* My trademark logo, the blue butterfly, is no longer visible.

Please let me know what you think of this new theme by either posting a comment or filling out the pop-up poll. If you have any other suggestions for themes, please post your suggestions as comments, either here or in the poll, as well.

Mar 13 10

An Algorithm for Algorithms: One Student’s Quest to Pass a Course in the Design & Analysis of Algorithms, and a Poem Upon Reflection

by Benjamin L. Russell

[The material in the following post was quoted, slightly edited for corrections and hyperlinking of titles and terms, and expanded from a personal message that I had sent to one student in an introductory algorithms course who had asked how I had managed to pass a similar course in college.]

The first time I took “Computer Science 365a: Design and Analysis of Algorithms” at Yale University in circa 1991 (the suffix ‘a’ indicates a fall semster course, while ‘b’ indicates a spring semester course), I didn’t actually take it for credit, but happened to be on a Leave of Absence while present on campus, and audited the course, instead. Fortunately, the people in the Department of Computer Science didn’t know that I was on a Leave of Absence, and I was able to take a dry run of the course, so to speak.

The reason that I did this was that I had previously had great difficulty in conquering a prerequisite course in discrete mathematics, “Computer Science 202a: Mathematical Tools for Computer Science,” in fall of 1990, which had covered various topics, some of which were easy, and some of which were painful. I found axiomatic set theory and graph theory relatively easy, propositional calculus tedious but not terribly difficult, number theory interesting, but discrete probability theory painful, and, in particular, linear algebra, taught at the fast pace early in the morning at 8:30 AM when I was extremely sleepy and tired, unmanageable. The whole point of the entire course was to teach students to write proofs, and almost every problem of every problem set required writing a proof of some kind, most proofs being based on mathematical induction.

I eventually dropped this course for credit, but continued remaining in class until the end of the semester so that I would know what to expect next time. Then I audited this course at the same time as auditing CS 365a the next year, and was able to pass all the problem sets and examinations. (Later, I somehow managed to convince the Department of Computer Science to exempt me from taking the course again, since I had essentially successfully completed it once.)

Of all the courses that I was required to take in college, the algorithms course required by far the most preparation, and essentially dominated my life for almost the entire period between fall 1991 and fall 1993, when I eventually managed to pass it. The textbook used was Introduction to Algorithms (First Edition), by Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. (The first and second editions of the book contain substantially significant content, so be sure to distinguish between them. There is even a third edition, just published on September 30, 2009, Introduction to Algorithms, Third Edition, by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein, which reportedly is significantly easier to read.)

(My roommate at the time, Scott P. Horne, who at first was himself pursuing the major in Computer Science (he eventually switched to Chinese Literature, even though he was far more proficient in both mathematics and programming than I, because he claimed to have gotten “bored” with the major in Computer Science), once claimed that it would take me “seven years” to complete that course; I managed to do it in about two and a half years, but only after countless nights of little or no sleep, and two head-splitting migraine headaches after working on mathematical problem sets (one on discrete probability theory in the CS 202a course, and the other on context-free grammars in a later course, “Computer Science 366b: Models of Computation,” on automata theory and formal languages, which unfortunately I didn’t pass while auditing, but which provided much practice in writing of formal proofs, and which still seemed less work (but more abstract work) than CS 365a, and boosted my confidence in one day eventually being able to pass CS 365a). (I also once overextended my stamina during an open-book day-long examination in a course in first-order logic, which happened to be in the Department of Philosophy, but required writing proofs that spanned several pages per proof; although I didn’t get a headache at the time, my body broke down and felt poisoned within from accumulation of stress, and the University Health Services (colloquially known by the acronym “DUH,” from its previous title of “Department of University Health,” which changed about twenty years ago) required me to have an enema.))

Conquering the algorithms course required first carrying out the following steps beforehand:

1. Conquering mathematics phobia.

2. Reviewing algebra and becoming proficient at writing mathematical proofs, including mathematical notation.

3. Finding a way to study mathematics without incurring too much stress.

Step 3 above implied the following sub-step, which automatically solved Step 3:

3a. Finding a way to discover fun in mathematics.

In order to conquer Step 1 above, I managed to enlist the assistance of an undergraduate mathematics student, Andrew Barnes, who agreed to tutor me once a week for about an hour per lesson gratis during the summer of 1991. He gave me what he referred to as tutoring in “elementary mathematics at an advanced level,” by which he meant that although the tutoring started out from axiomatic set theory and did not assume any background in mathematics above high school level algebra, most problems were of a theoretical nature, and focused on writing proofs of lemmas and theorems, rather than just problem-solving.

By the end of that summer, I was sufficiently prepared to take “Mathematics 270a: Set Theory,” which was a course in axiomatic set theory in the Department of Mathematics which required writing many proofs of theorems. To my surprise, while I found that course not difficult and even fun, there was another undergraduate student in the course who was majoring in Electrical Engineering who had more background than me in mathematical problem-solving but much less in proofs, who failed the mid-term course because of an inability to write proofs of theorems, and who subsequently dropped the course. It was at this point that I realized that maybe I wasn’t quite so stupid in mathematics as I had believed that I was, and that with sufficient effort and persistence, I may yet have some glimmer of eventually passing even the algorithms course.

Related to Steps 3 and 3a, and related to the above-mentioned issue, there were also four books which were of monumental importance:

i. Naive Set Theory (Undergraduate Texts in Mathematics), by Paul R. Halmos

ii. Elements of the Theory of Computation (Prentice-Hall software series) (First Edition), by Harry R. Lewis and Christopher Papadimitriou
This textbook was recommended to me by a former student in the Computer Science and Mathematics major, Robert Kissel, who had also been my supervisor at CADWARE, a software company in New Haven, Connecticut, in 1992, at where I had worked as a part-time English-Japanese technical translator. He then recommended that I ask Dana Angluin, a researcher at the Department of Computer Science at Yale University and wife of Stanley C. Eisenstadt (the professor at the same department who later taught my much-feared “Computer Science 323a: Introduction to Systems Programming” course in fall of 1993, which I almost didn’t pass). Ms. Angluin had the opposite personality of the stern Professor Eisenstadt who gave out so much homework that students had very little time to do anything but work on his problems sets, and who gave very few hints in class, out of a concern not to “bore students out of [their] skull[s] by ’spoon-feeding’ them; instead, Ms. Angluin was one of the kindest and most helpful people whom I had ever met in the department. She agreed to give me a short tutorial on that book once a week, which helped me in understanding such concepts as diagonalization, finite-state automata, Turing machines, and various models of computation. This book is extremely thorough, rigorous, and abstract; although I myself enjoyed this book very much (partly because I did not need to complete any problems sets based on it), students tend either to love it or hate it, depending on how abstract their thinking.

iii. Introduction to Set Theory (Pure and Applied Mathematics) (also used as the textbook for Mathematics 270a), by Karel Hrbacek and Thomas Jech (currently, there is an updated edition, Introduction to Set Theory, Third Edition, Revised and Expanded (Pure and Applied Mathematics), by the same authors)

iv. Compared to What?: An Introduction to the Analysis of Algorithms (Principles of Computer Science Series) (used as a supplementary textbook when I finally actually passed the algorithms course, CS 365a, in fall of 1993), by Gregory J. E. Rawlins
This was the book on algorithms that I enjoyed reading the most by far. Unlike Introduction to Algorithms, this book treated each algorithm by starting out with a problem which had not yet been solved, and leading up to how the solution was discovered. This approach had the advantage of not intimidating the reader by presenting each solution as if it had been arrived at by a genius by magic out of thin air, but instead showing that each algorithm, far from being a magical artifact that had been spontaneously and instantaneously conceived of by such a genius, was in fact only the result of a series of stepwise refinements that any layperson of sufficiently mathematical maturity could have arrived at with sufficient thought. The importance of this point cannot be belabored: One reason that at least some students, myself included, at first have difficulty in approaching the subject of algorithms is that they do not understand how the solutions are arrived at, and jump to the conclusion that they somehow require a person of unusual mathematical ingenuity to come up with, and that such algorithms are either arrived at almost instantaneously by geniuses, or never arrived at by almost everyone else. This is simply not true. While other books tend to mask the process of thought required to arrive at the solutions, this book covers this area in great detail, while also adding an appealing cover and a number of quotes from Alice’s Adventures in Wonderland to give invoke the atmosphere of that story. (I also covered my copy of the book with a kaleidoscopic book cover from Peacock Paper Products (a company which doesn’t seem to exist anymore) to enhance this effect.)

Another book that I personally found very helpful in establishing self-confidence in mathematics, but which was irrelevant to the subject area of algorithms, and which may be irrelevant in your case (depending on one’s mathematical background and area(s) of expertise) was a book on calculus which was one of the very few on that subject that I found fun to read. At the time, I had felt extremely uncomfortable with calculus because of a bad experience that I had had in my first year in college, when somebody had stolen my calculus textbook and my class notes as they were left in the law school library when I went to visit my economics professor just after mid-term when I was already down to three classes from having dropped introductory French; if I then either dropped or didn’t pass any more courses, I would have flunked straight out of college completely. The book that finally caused me to regain confidence in this subject area was the following:

v. A First Course in Calculus (Undergraduate Texts in Mathematics) (also used as the textbook for my summer courses in calculus in 1990), by Serge Lang
Be forewarned: The subject area of this book is completely irrelevant to algorithms. The only purpose of this book was to reestablish confidence in my ability to do mathematics in general and calculus in particular after the above-mentioned incident. However, this book actually was interesting to read, because, if I remember correctly, it showed me how to appreciate, among other aspects, beauty in mathematics; in particular, it demonstrated elegant solutions to problems as opposed to ugly solutions. Although the material itself was elementary, the exposition was eloquent, and the proofs were well-written. The treatment was very user-friendly, and the author even was able to inject humor at one point. Reading this book made me feel that there was a live human being writing it who understood the importance of eloquence in writing and elegance in treatment in describing mathematics, rather than simply beating every definition, lemma, and theorem into the logical ground by presenting it in strict logical order, regardless of difficulty level. With this book, I did not constantly need to skip around the book or to refer to other books to read it, because every point followed directly in a reader-friendly manner from the preceding points. In addition, the instructor, Jay Jorgensen, of the Department of Mathematics, was very helpful, and had the habit of repeating important points twice, thus making them easier to remember (he once explained that the real reason that he did this was that the course was taught at 10 AM, after he had pulled an all-nighter in doing his mathematics research every night, and that he was simply too sleepy to be able to think of something new to say for every single sentence).

Also related to Steps 3 and 3a were a number of other courses and their textbooks that I also studied in the Department of Mathematics, which I found at least tangentially related to algorithms:

A. Course Title: Mathematics 456b: Recursive Function Theory
Professor: Ron Sigal
Textbook Title: [first edition of the following title] Computability, Complexity, and Languages, Second Edition: Fundamentals of Theoretical Computer Science (Computer Science and Scientific Computing), by Martin Davis, Ron Sigal, and Elaine J. Weyuker
This is the second edition of the textbook for the course in this topic taught by Ron Sigal, one of the authors of this book; I used the first edition of this book (for which I cannot seem to find an online reference at the moment). This book provides a good background in elementary computability theory, which is somewhat related to complexity theory, which is the field of algorithms.

B. Course Title: Mathematics 800 (a graduate-level course): Recursion Equations [domain theory]
Professor: Ron Sigal
Textbook Title: <none> (we used class notes for the above-mentioned book for the course in Recursive Function Theory, prepared by Ron Sigal)
This course discussed domain theory, including partial orderings and complete partial orderings (CPO’s), and was a sequel to above-mentioned course in Recursive Function Theory.

C. Course Title: Computer Science 430b: Formal Semantics
Professor: Paul Hudak
Textbook Title: Semantics of Programming Languages: Structures and Techniques (Foundations of Computing), by Carl A. Gunter
This was one of the few courses in the major in Computer Science which I actually found to be fun. The topic was the lambda calculus, the theoretical basis of Scheme. Professor Hudak’s exposition was very lucid and understandable. I later audited another of his courses, Computer Science 201a: Introduction to Computer Science, after completing the major and graduating, in fall of 1993 (if I remember correctly), just to have an opportunity to program in Scheme. The mid-term project for that course, Schemetris, in which students were each paired with a partner and required to use Scheme graphics libraries provided by the instructor to build a Scheme version of Tetris, greatly increased my confidence in being able to program in Scheme, because my partner chiefly told me which functions to write while I did most of the actual coding, and we got the project done after three days of nearly constant work (during this period, I only slept between about 6 PM and 10 PM every evening, and then pulled an all-nighter afterwards). Additionally, the TA for that course also commented that one solution which I wrote for one of the problems was much more elegant than the solution arrived at by the TA himself. Professor Hudak’s influence at that time is partially responsible for my current interest in the functional programming language Haskell, of which he was one of the architects.

I also took some courses and read their required textbooks in the Department of Philosophy taught by Professor Patricia (“Patty”) Blanchette on first-order logic and the logic of Gottlob Frege. However, their exact course titles and their related textbooks do not spring to my mind at the moment. While they were helpful in providing practice in logical reasoning and in writing formal proofs, they were not so crucial to my preparation for algorithms as the others; nevertheless, I would suggest taking a course in first-order philosophical logic if you have time. At least this experience provides an interesting background side story to mathematical logic; in my case, it opened the avenue of philosophy as a hobby, and lead to my reading other books in philosophical logic and related topics, especially by Bertrand Russell.

Finally, related to Step 2 above, I went to the local library at the Department of Mathematics and borrowed an old hardcover book with a brown cover, whose title I don’t remember, published sometime in the 1960’s (if I remember correctly), which contained extremely detailed explanations of the rules of elementary high school algebra and dozens of mini exercises in high school algebra at as elementary a level as the FOIL rule. Then I went home and practiced with this book for about an hour a day (I couldn’t work with it any longer than that because I would get tired, bored, and sleepy after that time.) The title and authors of this book did not stick in my memory because I didn’t enjoy working with this book; I just happened to come across it as the least-bad solution that I could find from searching through most of that section of that library to find a book that would give me extensive practice in high-school level algebra, which I had practiced before matriculation, but mostly forgotten out of lack of practice. One can probably find many other far more interesting books if one looks hard enough. The most important point about this book was that it was very old and came from a generation of teachers who believed that this area of mathematics was best taught by lots of rote practice; while I found this method extremely boring, the amount of practice did work in my case.

Additionally, to become proficient at mathematical notation, I practiced writing various mathematical symbols which could be easily confused with other writing symbols. In particular, I wanted to be absolutely certain that I could distinguish between the uppercase letter ‘I,’ the lowercase letter ‘l,’ and the numeral ‘1′; the lowercase letter ‘o,’ the uppercase letter ‘O,’ the numeral ‘0,’ and the uppercase Greek “theta” letter; the lowercase ‘a’ letter and the Greek-lowercase “alpha” letter; the uppercase ‘N’ letter (used to represent a constant integer) and the symbol ‘N’ (used to represent the set of natural numbers); ‘N(sub-0)’ (where ‘N’ represented a constant integer), ‘N(sub-0)’ (where N represented the set of natural numbers), and the Hebrew symbol aleph-null (used to represent the cardinality of the set of natural numbers in set theory); the lowercase ‘x’ letter, the non-mathematical multiplication symbol ‘x,’ and a small closing parenthesis ‘)’ followed without a space by a small opening parenthesis ‘(‘; the ‘.’ symbol and the mathematical multiplication “dot” symbol (only slightly raised); and the uppercase ‘C’ letter and a left-parenthesis ‘(‘ symbol.

For example, I wanted to be able quickly to distinguish between the following one-line lists of expressions:

C) alpha dot x. (l(sub-I) – I(sub-l))(sub-1) =? o.0.O.N (where ‘N’ represents the set of natural numbers)


() a . )(. (I(sub-l) – l(sub-I))(sub-l) =? theta(O.0.o.N (where ‘N’ represents a constant integer))

(Try writing both lines out quickly as if following a crazed, sleepy, and absent-minded mathematics or computer science theory professor with sloppy handwriting in the middle of a rushed class towards the end of a crucial proof which is guaranteed to appear on the next examination.)

Therefore, I set some time aside to practice writing each letter of the Greek alphabet in both uppercase and lowercase, and to establish my own typeface rules to distinguish carefully between the above letters:

The uppercase letter ‘I’ must have serifs on both vertical ends; the lowercase letter ‘l’ must be written in a cursive font; the numeral ‘1′ must be written such that the upper end is written starting slightly below the top end of the numeral and then springing immediately back all the way to the baseline.

The lowercase ‘o’ letter must be written small, the uppercase ‘O’ letter must be written large and never crossed out with a single line, the numeral ‘0′ must always be crossed out with a single line proceeding out of both sides, and the uppercase Greek “theta” letter must always be crossed out with a single line proceeding out of only the right side.

The lowercase Greek “alpha” letter must have both serifs on the right side pointing slightly upwards and slightly downwards, respectively, both toward the right; the lowercase ‘a’ letter must have a single serif on the lower-right corner pointing straight down.

The uppercase letter ‘N’ (used to represent a constant integer) must be written as a plain uppercase letter ‘N’; the special symbol ‘N’ (used to represent the set of natural numbers) must have a double-line instead of a single-line for its left side.

‘N(sub-0)’ (where ‘N’ represents a constant integer) must be written with a plain uppercase letter ‘N’; ‘N(sub-0)’ (where N represents the set of natural numbers) must be written such that the ‘N’ symbol comprises a double-line instead of a single-line for its left side; the Hebrew symbol aleph-null (used to represent the cardinality of the set of natural numbers in set theory) must be written such that it has a pronounced diagonal line jutting out from both ends, with relatively smaller vertical lines extending on mutually opposing sides from this diagonal line from inside both end points.

The lowercase ‘x’ must be written as a horizontally inverted lowercase ‘c’ letter joined to an immediately following small ‘c’ letter (alternatively, it can be written as an extended diagonal line extending from the upper-left corner to the lower-right corner, intersected by a relatively small perpendicular intersecting line extending from the upper-right corner to the lower-left corner); the non-mathematical multiplication symbol ‘x’ is written as a regular lowercase ‘x’ letter, but in a smaller size; and a small closing parenthesis ‘)’ followed without a space by a small opening parenthesis ‘(‘ is written such that both parentheses are relatively obtusely curved (i.e., lightly curved) compared to a horizontally inverted lowercase ‘c’ letter followed without a space by a lowercase ‘c’ letter.

The ‘.’ symbol is always written such that the symbol intersects the baseline; the mathematical multiplication “dot” symbol (only slightly raised) is always written such that the symbol is written slightly above the baseline.

The uppercase letter ‘C’ is always written such that the right ends of the letter are curved sharply to the right; a left-parenthesis symbol ‘(‘ is always written such that the ends of the symbol are relatively softly curved to the right.

Finally, in fall of 1993, I took CS 365a. I performed acceptably on the problem sets, and although I somehow still failed the mid-term exam (an incident over which the professor, Michael J. Fischer, was very upset), I passed the final and the course. Finally, “the CS 365 experience” (as one of the graduate TA’s termed it) was over (this person must have known that it was a special experience, since nobody else had ever referred to “the CS xyz experience,” where x, y, and z were any other integers)! I breathed a long sigh of relief, happy that nothing else of comparative significance in challenge stood in the way to completing the Computer Science major.

My preparation for this course did come at high cost, however; I had taken so much time and effort in preparing for this one course that I had neglected my programming during most of this period, and had become extremely rusty in even constructing linked lists. As a result, I was unable to complete most of the problem sets for another required course that I had taken during that semester, the above-mentioned “CS 323a: Introduction to Systems Programming,” and didn’t have enough time to take courses in compilers and interpreters or in operating systems, or to learn linear algebra; consequently, I did not become a programmer for lack of confidence in programming, or proceed on to graduate school in computer science for lack of knowledge of such topics as eigenvectors, eigenvalues, and determinants in linear algebra. However, I did continue studying computer science (in particular, the field of programming language theory, particularly pertaining to the programming languages Scheme and Haskell) as a hobby in my spare time.

Instead, I plan to start a project to create a virtual world where people can spend most of their lives, including working to earn real-world funds, studying such topics as functional programming and translation, and performing various banking and financial errands which are currently usually performed in the real world, from within the virtual world; I just need a collaborator, preferably familiar with the language Squeak (an implementation of Smalltalk) and the virtual world construction tools Croquet and Cobalt.

My experiences in college and after graduation later inspired me to write the following poem (which is still a work in progress), The Sea of Time, or Once Upon A Program, à la La Divina Commedia by Dante Alighieri, thematically based on my trials and tribulations with the Computer Science major.

(Canto I is based on my experiences in college; Canto II is based on my experiences after graduation from college, but before returning to Tokyo (Francis T. McGrath III was my roommate for a portion of this time) (I had previously resided in Tokyo between August of 1979 and August of 1989, before matriculation at college, graduation, moving to New York, and finally returning to Tokyo in June of 2004); Canto III is based on my experiences after returning to Tokyo. In a sense, the structure of this poem is a microcosm of the structure of La Divina Commedia: Canto I roughly corresponds to the Inferno, in that my life in college was, roughly put, a living hell; Canto II to the Purgatorio, in that my life after college, while significantly less stressful than that in college, seemed like one long odyssey to try, somehow, to obtain enough funds to return to a place within commuting distance of Akihabara; and Canto III to the Paradiso, in that, at long last, after countless trials and tribulations, I finally managed to return to my power source in visiting the mecca of anime and RPG’s: my other two hobbies):

The Sea of Time, or Once Upon A Program

Canto I

To C, or not to C,
Oh, what a mystery

Does life so have to be;
Yet, ever shining sea,

So fresh, so sweet, so deep;
Ever enticingly,

Risk all, for Greatness sake,
In haste, commands so, she.

With trepidation, I
Set warily about.

The sky, so overcast:
Across it race dark clouds,

Yet in the distance shine
Those rays: hopes, zest, and dreams.

Beguilingly, they spark
And beckon to their womb.

Let Daring be my sword,
With Fortitude my shield,

May Youth now be my boat,
With Zest of Life my sail.

The adversary, Time,
Unbeckoned, fans a gale.

He taunts, Thy trip is far,
Your boat shall soon decay,

And when it is no more,
Consume you shall the sea.

So set I on my quest,
In search of new honor,

The storm, so ominous,
Loomed dead ahead, nearing.

As when Poseidon loomed,
His Trident shining bright,

Yet he, Ulysses showed
No fear, but sailed on.

So kept I on my quest,
Regardless of Time’s words,

Full knowing, should I fail,
That doom would lay ahead.

Canto II

‘Twas two months short a year
One day, when sun cast light

Upon a youthful boat
Whom Chance gave luck to meet.

“‘Tis Lafcadio Hearn
Who calls me yonder here,

“That, one day, I might see
A Land of Rising Sun.”

So booms, within the boat,
A voice, so loud and clear

No fear or doubt can rock
Its manifest resolve.

“Whose voice is this I hear,”
Ask I, as we face off.

“‘Tis Francis T. McGrath,
The Third, who asks of thee

“That thou shalt tutor me
On that ancient country.”

“Thou may have, in return,
Such means necessary

“For in that Empire State
A Big Apple to see.”

This contract we both vow,
and onward we journey.

Until one cannot miss
This sight, amid its steeps,

Looming, high above, this
City that Never Sleeps.

Such land where Chaos reigns
Never before have seen!

Inhabitants around
Push others all about,

‘Til not a soul is stout,
Still standing on the ground.

Each warily about,
Gazes, always to doubt,

Whether ’tis safe to trust,
Or else be made to bust.

Canto III

Circle of Hell, each year,
In Big Apple, it was,

As Dante watched from far,
Purgatory to come.

Yet one day at last came:
The Rising Sun arose,

Icarus, stay away!
Let not my wings melt now!

The heavy bird landed,
Its wings made of metal,

Thus dawned a new journey:
Land of the Rising Sun.

The Eastern Capital:
Its Rainbow Bridge at night,

Gleamed bright in the starlight,
The monorail zoomed by.

City of Lights it was,
Scintillating at night.

Each halo gleaming bright,
A ray of hope in sight.

Akihabara now!
At long last, finally!

So many years have passed:
A harsh taskmaster, Time.

But wait shall be no more,
For I am back to see

The land of anime
and role-playing magic.

But wait! Computer shows:
Where have they disappeared?

And all the showrooms gone?
Only an Apple Store?

Something has gone amok!
Audio centers: none!

Roppongi Hills is here,
A labyrinth must-see.

Yet I would rather be
Back in 1980.

– by Benjamin L. Russell
March 13, 2008

Dec 10 09

From Hofstadter’s “Prolegomena to Any Future Metacat” to Marshall’s “Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception”: What Is The Mind’s I?

by Benjamin L. Russell

(This content of this post is based on the content of my post [1] entitled “Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception” on the USENET newsgroup comp.lang.scheme.)

Recently, I stumbled across a rather fascinating project entitled “Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception” [2], by James B. Marshall.

According to the Overview (see the hyperlinked site above),

Metacat is a computer model of analogy-making and perception that
builds on the foundations of an earlier model called Copycat. Copycat was
originally developed by Douglas Hofstadter and Melanie Mitchell as part of
a research program aimed at computationally modeling the fundamental
mechanisms underlying human thought processes. Central to the
philosophy of this research is the belief that the mind’s ability to perceive
connections between apparently dissimilar things, and to make analogies
based on these connections, lies at the heart of intelligence. According to
this view, to understand the analogical mechanisms of thinking and
perception is to understand the source of the remarkable fluidity of the
human mind, including its hidden wellsprings of creativity.

For those of you who may have read the book Fluid Concepts And Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought [3], by Douglas Hofstadter, Copycat was a computer model of analogy-making and perception that sought to examine the underlying process of human creativity by focusing on analogies between patterns of sequences of letters within words.

The name “Metacat” reminded me of a chapter in the book entitled “Chapter 7. Prolegomena to Any Future Metacat,” which itself was probably a pun on the book Prolegomena to Any Future Metaphysics [4], by the Western philosopher Immanuel Kant (German philosopher (22 April 1724 – 12 February 1804)). Although the chapter title is not referenced on the above-mentioned site, since the architect, Marshall, worked together with Hoftstadter, it is highly likely that that chapter title is at least partially responsible for the name of the project.

Copycat lacked the ability of “self-watching”; it was unable to examine how it arrived at its answers, and hence was unable to draw conclusions on a meta-analogical level. Metacat addresses this issue, as follows (see the above-mentioned Overview):

Metacat focuses on the issue of self-watching: the ability of a system
to perceive and respond to patterns that arise not only in its
immediate perceptions of the world, but also in its own processing of
those perceptions. Copycat lacked such an “introspective” capacity,
and consequently lacked insight into how it arrived at its answers. It
was unable to notice similarities between analogies, or to explain the
differences between them or why one might be considered to be
better or worse than another. In contrast, Metacat’s self-watching
mechanisms enable it to create much richer representations of
analogies, allowing it to compare and contrast answers in an insightful
way. Furthermore, it is able to recognize, remember, and recall
patterns that occur in its own “train of thought” as it makes analogies.
For instance, by monitoring its own processing, Metacat can often
recognize when it has fallen into a repetitive cycle of behavior,
enabling it to break out of its “rut” and try something else.

I tried out the Metacat simulation (its runs using Petite Chez Scheme + the Scheme Widget Library (SWL) + Tcl/Tk (version 8.3 or later)) on Windows XP Professional, Service Pack 3 (for which there is a self-extracting installer that installs Petite Chez Scheme, SWL, and Tcl/Tk combined), and it worked! Although it took up a lot of memory and ran rather slowly (the execution speed is adjustable), it graphically represented the analogy-making process in real time.

This kind of self-referencing cognitive project seems ideally suited to Scheme. If anybody knows of any similar self-referencing or reflective [5] projects for which the strengths of Scheme stand out, I would appreciate it if you could post related information below.

Now, back to the main question in the title of this post: What really is The Mind’s I?

[1] Russell, Benjamin L. “Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception.” Online posting. 11 Nov. 2009. 10 Dec. 2009. <news://comp.lang.scheme>. Also available at <>.

[2] Marshall, James B. “Metacat: A Self-Watching Cognitive Architecture for Analogy-Making and High-Level Perception.” James B. Marshall. 17 Oct. 2009. 10 Dec. 2009. <>.

[3] Hofstadter, Douglas R. Fluid Concepts And Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought. New York, NY: Basic Books, Inc., March 21, 1996. <>.

[4] Kant, Immanuel. Prolegomena to Any Future Metaphysics. Cambridge: Cambridge University Press, 1783. <>.

[5] “Reflection (computer science) – Wikipedia, the free encyclopedia.” Wikipedia, the free encyclopedia. 9 Sept. 2003. 10 Dec. 2009. <>.

Dec 10 09

H. L. Mencken’s Law, and the Inverse Relationship Between Knowledge and Happiness

by Benjamin L. Russell

While walking through the halls of Arthur K. Watson Hall in the Department of Computer Science at my alma mater (a dismal school with Gothic architecture located somewhere in New Haven, the mood of academic tortu*, ahem, of academic pursuit, rather, of which in 1989 to 1994 (including my LOA) was characterized by a local art poster which once depicted a screaming man getting run over a train while muttering “This too shall pass.”), one day in circa 1992 to 1994, I once encountered a sheet of paper pasted to a door of one of the professors of computer science with the following inscription:

Those who can, do.
Those who can’t, teach.
Those who can’t teach, do research.

Later, I came to learn that the first two lines have been attributed to H. L. Mencken (American journalist, essayist, magazine editor, satirist, critic of American life and culture, and student of American English (September 12, 1880 – January 29, 1956)); however, it is not clear who came up with the last line.

In a brief mental journey to that very location in the past, just this evening, I happened to have a dream in which I was back at Old Campus at my college, but this time, as a guide who was showing a visitor around campus while jokingly mimicking the other students. Upon waking up, I somehow came up with my own version of an extension to Mencken’s Law, which I have termed “Russell’s Extension to Mencken’s Law” below:

Those who can, do.
Those who can’t, teach.
Those who can’t teach, write.

– myself

Later, I learned that the extension that I had come up with was very similar to a different extension that someone else had also devised:

Those who can, do.
Those who can’t, teach.
Those who can’t teach, criticize.

– Solomon Short

After some research, I discovered that Solomon Short is actually a fictional alter ego of David Gerrold (American science fiction author (24 January 1944-)), author of The Galactic Whirlpool [1].

(Just to be fair, Nicolas Martin had also previously, without my knowledge, came up with what is sometimes known as Martin’s Extension to H. L. Mencken’s Law:

Those who can, do.
Those who can’t, teach.
Those who can’t teach, teach education.

– Nicolas Martin


These quotes bring to mind another saying that I had also see posted on another door in the same department on another occasion. Although I can’t remember the exact wording (and am unable to find the reference now), the gist was somewhat like the following:

Before going to college, we think that we know everything.
When in college, we learn that there are some things that we don’t know.
When in graduate school, we learn that there are quite a few things that we don’t know.
When in a Ph.D. program, we learn how truly enormous and insurmountable are the things that we really don’t know.

(If anybody knows the source or exact wording of the above-mentioned quote, I would greatly appreciate an explanatory comment with the corresponding information below.)

Incidentally, there can be an inversely proportional relation between knowledge and happiness. In my case, before going to college, I felt much happier than just after graduating, because I did not know how much I did not know. In particular, I felt that I could easily become an astronomer or astrophysicist or any other professional with enough study. Before going to college, I was greatly inspired by the book Cosmos [2] by Carl Sagan (American astronomer, astrochemist, and author (November 9, 1934 – December 20, 1996)), which described a variety of scientific topics related to astronomy and astrophysics. Back then, I used to dream of becoming an astronomer or astrophysicist.

Unfortunately, I had one major problem: Because of financial circumstances, I was unable to attend high school in Tokyo before matriculating at college (I was almost entirely self-taught), and was therefore unable to participate in physics-related lab experiments, which required access to a laboratory. Furthermore, I did not have access to sophisticated books or tutors for physics, and my home environment was very noisy and distracting. Worst, my concentration was repeatedly interrupted every thirty minutes or so by an over-protective parent at home who insisted on bringing pink grapefruit to my desk, and who demanded to know what I was thinking about every time that I tried to contemplate anything deeply. Essentially, I matriculated at college without having had much exposure to physics.

Then I discovered that the introductory courses in physics all assumed more knowledge of physics than I had, so I wound up never taking any courses in physics for lack of time to catch up. Instead, I wound up pursuing computer science, since I had had some exposure to programming. Although I eventually graduated with a Bachelor of Science in computer science, my lack of significant aptitude in mathematics (I initially had to overcome mathematics phobia as well) caused me to lose much sleep in trying to catch up, and I eventually chose not to pursue computer science in graduate school because of fear of over-exhaustion due to even more lack of sleep; I had not mastered linear algebra, in particular, in part because my visiting professor had only spent two weeks on the subject in my discrete mathematics class, and I was very concerned about whether I could catch up with such a serious deficiency, since I was keenly aware of how fundamental and essential this topic would be in any graduate program in computer science.

Paradoxically, however, my great difficulty in overcoming the mathematical prerequisites for computer science did confer one hidden advantage: The process made me much better at explaining mathematical concepts to beginners and to students who are relatively weak at mathematics. After graduation, I once tutored my roommate in mathematics, and he commented then that my explanations were significantly easier to understand than those of his regular mathematics tutor. Later, one of my employers in Manhattan told me that he was much better at teaching a subject than others who knew more about the subject than he did, because he was able to see the subject from the student’s perspective. Paradoxically, my explanations for mathematics have therefore generally been evaluated much more favorably than those for English, which is my forte, because when teaching mathematics, I can understand why the student finds a concept difficult, whereas in English, most concepts seem so trivially elementary that I simply cannot understand how any student could possibly not understand them. Therefore, I have the curious advantage of being much better at explaining those mathematical concepts that I do understand than mathematicians who have never had to struggle in understanding them.

Back to physics, however. Curiously, however, before matriculating at college, I had always found the few books on physics that I had studied to be much more interesting and approachable than equivalent-level books on high school mathematics (in particular, although I enjoyed set theory and, later on, graph theory, I had always had a strong antipathy toward algebra and, in general, toward any areas which did not allow visualization of the concepts). I enjoyed visualizing structures mentally, but had difficulty in doing so in algebra; however, I found visualization much easier in physics. Even in college, when I occasionally encountered physics problems that a tutor nearby was explaining to other students, I was surprised that I did not feel that the problem seemed difficult, since I could readily visualize it. However, I was so overloaded in coping with computer science that I had no spare time in which to explore physics at the time.

In college, I discovered that my visual approach to learning put me at a distinct disadvantage in learning topics that could not readily be approached visually, such as algebra. This discovery made me deeply unhappy. My writing professor told me that I had a flair for writing and actually walked me to the library microfiche collection in recommending that I apply for a graduate program in English, whereas Drew McDermott, my computer science professor for Computer Science 201a: Introduction to Computer Science, had told me that he thought that I was “not cut out for computer science”; nevertheless, I was determined to prove McDermott wrong, and persevered in computer science, eventually completing the major after going through much difficulty in overcoming Computer Science 365a: Design and Analysis of Algorithms. (Curiously, I found the following course, Computer Science 465b: Advanced Algorithms, to be much more manageable than the introductory course, even though the material was more sophisticated, because there was much less material to cover.)

It was not until many years later, a few days ago, while watching a series of educational programs on NHK public television in Japan, that I realized that I actually found the topics in science to be distinctly easier to understand than topics of an equivalent level in algebra; until then, I had thought that the difference would be insignificant. This was when I realized that I might have been better off in a natural science; in particular, physics.

In order to undo my depression from the college experience, I had to restore my mental continuation (to borrow a concept from the Scheme programming language) to just before entering college, and to set aside mentally the entire college experience and accompanying negative emotional weight as a separate continuation, pretending that a different mental process had proceeded with that mental computation. This required visiting some memorable places before college and recovering some emotional memories by walking along some of the same paths and drinking some of the same soft drinks as back then. This brought back the mathematics phobia that I had previously overcome, but at least I was no longer depressed: I had, in a sense, gone back in time mentally and restored my old self, so to speak.

If more knowledge can lead to less happiness, then, by the inverse rule, less knowledge may also lead to more happiness (or at least to a recovery of happiness preceding the greater knowledge).

A corollary is that great souls often are burdened with great unhappiness. Albert Einstein (US (German-born) physicist (14 March 1879–18 April 1955)) once said the following [3]:

If I had only known, I would have been a locksmith.

Wolfgang Amadeus Mozart (composer born in the Archbishopric of Salzburg ((27 January 1756 – 5 December 1791)) died at only thirty-five years of age. Thomas Edison (American inventor, scientist, and businessman (February 11, 1847 – October 18, 1931)) suffered hearing impairment when his chemical laboratory caught fire and the train conductor smacked him on the ears and threw him off the train. Ludwig van Beethoven (German composer and pianist (17 December 1770[1] – 26 March 1827)) reportedly suffered a severe form of tinnitus which caused him great difficulty in hearing his own music; he was also tragically unsuccessful in romance.

I may not be a “great soul” (yet!), but recently, I have come to believe that until I become one, I at least have the benefit of being less unhappy (at least for now).

Recently, I have become more interested in resurrecting my old study of physics, and to see how far I might be able to pursue this subject formally. In particular, I am tentatively entertaining the (admittedly perhaps wild) idea of re-doing my entire undergraduate degree in physics instead of computer science, just to see how far I could go. I have a curious feeling that, because of my visual way of thinking and my greater aptitude for physics than mathematics, I might actually be able to attend graduate school faster by redoing my entire undergraduate education in physics than by trying continue directly on to graduate school in computer science. As a first step, I am considering reviewing some of the Feynman Lectures on Physics [4] at the next available opportunity.

[1] Gerold, David. The Galactic Whirlpool. New York, NY: Random House Publishing Group, 1997. <>.

[2] Sagan, Carl. Cosmos. New York, NY: Random House, Inc., 1980. <>.

[3] Moncur, Michael. “Michael Moncur’s (Cynical) Quotations.” and Michael Moncur. 10 December 2009. 10 December 2009. <>.

[4] Feynman, Richard P., Robert B. Leighton and Matthew Sands. The Feynman Lectures on Physics. London, U.K.: Addison Wesley Longman, 1970. <>.

Oct 23 09

To Scheme, or Not to Scheme: Scheming Schemers and Non-Scheming Schemers, or Keeping the Fun in Scheme

by Benjamin L. Russell

Do you use the Scheme programming language? If so, do you program mainly in a serious mood to write applications, or in a crafty mood to have fun? In other words, do you consider yourself a non-Scheming Schemer, or a Scheming Schemer? I consider myself a Scheming Schemer: I program in Scheme mainly in a crafty mood just to have fun. To quote Alan Perlis from the dedication in SICP:

“I think that it’s extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don’t become missionaries. Don’t feel as if you’re Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don’t feel as if the key to successful computing is only in your hands. What’s in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.”

Alan J. Perlis (April 1, 1922-February 7, 1990)

Nevertheless, as many of you probably know, some recent developments in the evolution of the Scheme programming language have reduced the influence of the language. In particular, the R5RS vs. R6RS schism and the replacement of the 6.001 course at MIT, based on Scheme, with a 6.01 course, based on Python, are two events that have created much controversy.

Concerned over such events, recently, I posted a thread, entitled “Ideas for an SICP’?” on the comp.lang.scheme USENET newsgroup, asking for suggestions for an alternative to SICP with a similar content, as follows:

[W]ith Scheme replaced by Python at MIT, this special role of Scheme
as the vehicle for teaching sophisticated students of computer science
has been greatly diminished. Arguments against SICP as being too
difficult for an introductory textbook notwithstanding, the presence
and usage of that textbook contributed greatly to the significance of
Scheme as a tool in teaching introductory computer science.

It seems that SICP could use a replacement. What is needed is an
alternative textbook to use Scheme in a role that cannot be fulfilled
by such languages as Python, in order to foster creativity and
originality in programming for future freethinking hackers. In
addition, such an alternative textbook would need to be actively used
by leading educational institutions of introductory computer science
in raising a new generation of future Scheme hackers.

Does anybody have any suggestions for a plan that could lead to the
birth and growth of such an alternative leading textbook? Many
programmers tend to be strongly influenced by the first textbook that
they encounter in learning programming; whether that language is
Scheme or Python could have great effect on the future influence of
such languages. The SICP phenomenon has been done once; why not give
rise to a new SICP’ phenonemon?

There were several responses. In particular, one user, Ray, responded as follows:

[N]obody who’s grown up with the web, and who thinks of
computers as being primarily communications devices, will
believe that that makes a language anything other than a
crippled toy if you can’t interface with the hardware
capabilities of the machine, enabling you to do something
as “simple” as writing a web browser in it, managing network
connections, handling Graphical UI elements, and rendering
text and graphics on the screen.


Now, consider what Scheme’s got that Python doesn’t got.
It comes down to syntactic abstraction and continuations.
Courses based on SICP don’t use them, so MIT had nothing
to lose by going to Python.

Perhaps. But has Scheme really lost the essence of its appeal?

I disagree. Recently, I started reading a guide (albeit in Japanese, since I can read that language as well) by Shiro Kawai on the Gauche implementation of Scheme, and opened up a chapter on continuations. (For those of you who do not know, continuations are a control mechanism in Scheme which allows assignment of control flow to a variable, allowing a process to be “continued” (hence the name) from the point where the continuation was saved.) Since I had not yet fully understood continuations, I found the chapter extremely interesting, and could not stop reading. At one point, I encountered the following procedure which used a continuation, written in continuation-passing style (a.k.a. “CPS” style) (in which, rather than assigning the continuation directly to a variable, the continuation is explicitly passed as a parameter), to calculate the factorial function:

(define (fact/cps n cont)
  (if (= n 0)
      (cont 1)
      (fact/cps (- n 1) (lambda (a) (cont (* n a))))))

This procedure returns the same results as the following (much simpler) one (listed on the same page), which does not use a continuation:

(define (fact n)
  (if (= n 0)
      (* n (fact (- n 1)))))

In particular, I started at the mysterious “(a)” variable in the following piece of code above, wondering what that variable represented:

(fact/cps (- n 1) (lambda (a) (cont (* n a))))

Suddenly, it dawned upon me: The “(a)” variable stored the parameter that was passed to the continuation (“(lambda (a) (cont (* n a)))”), which, in turn, captured the control state of the procedure at that point of execution, which was, in turn, passed back as a parameter to the enclosing procedure! In short, the continuation was a microcosm of the execution context of the procedure at that point in time, encapsulated in a lambda abstraction!

Here, “fact/cps” is the name of the procedure, which stands for “factorial in CPS (continuation-passing style) form.”

Suppose we call “fact/cps” in the most simple case: a value of 0 for the first parameter, and a function that simply returns the parameter passed to it as a second parameter:

(fact/cps 0 (lambda (a) a))

Then, in “fact/cps”, “(= n 0)” is true in the if-statement, so “(cont 1)” is called, where “cont” is simply the second parameter of the enclosing “fact/cps” function, or “(lambda (a) a)” (the continuation), which returns the value of the parameter “a”, which is 1 in this case, so 1 is returned.

Let’s be a little bolder, and use a value of 1 for the first parameter:

(fact/cps 1 (lambda (a) a))

This time, “(= n 0)” is false in the if-statement, so the following recursive call is made (substituting 1 for n):

(fact/cps (- 1 1) (lambda (a) (cont (* 1 a))))

In the recursive call, (- 1 1) is substituted for the first parameter of “fact/cps”, and “(lambda (a) (cont (* 1 a))” is substituted for the second parameter of “fact/cps”. This is the same as the following call (reducing the first parameter to a value):

(fact/cps 0 (lambda (a) (cont (* 1 a))))

However, since we had passed an identity function, “(lambda (a) a)” for “cont” in the enclosing function, this reduces to the following call (expanding “cont” to “(lambda (a) a)”):

(fact/cps 0 (lambda (a) ((lambda (a) a) (* 1 a))))

Here, the explicit continuation “(lambda (a) ((lambda (a) a) (* 1 a))))” first takes whatever is handed to it as the parameter “a”, and passes it to the inner function (lambda abstraction, actually, but we’ll dismiss that point here), so we reach “((lambda (a) a) (* 1 a))”. But this is just the identity function on the inner “(* 1 a)”. So this part just reduces to “‘(* 1 a)”, which is the same as just “a”, which is whatever value is passed to this continuation, so this continuation is just the identity function.

So when “fact/cps” is recursively called with 0 for the first parameter n and this identity value continuation “(lambda (a) ((lambda (a) a) (* 1 a))))” for the second parameter cont, we first reach “(= n 0)” as the condition in the if-statement, which is true. This leads to evaluating the following:

(cont 1)

But “(cont 1)” is just this identity continuation called with 1, so it just returns the parameter 1. So, “(fact/cps 1 (lambda (a) a))” is just the value passed to it:


Of course, we didn’t need to pass the identity function “(lambda (a) a)” as the second parameter to “fact/cps”. We could have formatted the output, for example, by passing a formatting function, instead (to borrow the syntax of Gauche Scheme):

(lambda (a) (format #t "The factorial value is ~a." a))

Then we could have invoked “fact/cps” with that formatting function for the function to be passed as the continuation, as follows:

(fact/cps 3 (lambda (a) (format #t "The factorial value is ~a." a)))

This would have returned the following:

The factorial value is 6.#<undef>

Alternatively, we could have chosen to multiple whatever was returned by 2, just to screw up the function, as follows:

(fact/cps 3 (lambda (a) (* 2 a)))

This would have returned the following:


Hey, why not combine the two functions, and get Scheme to say something funny?

(fact/cps 3 (lambda (a) (format #t "I do solemnly swear that the factorial value is ~a." (* 2 a))))

Scheme then would have returned the following:

I do solemnly swear that the factorial value is 12.

Despite (or maybe because of?) all this exploratory monkey-business, in the “Aha!” moment described above, I felt an ecstasy of enlightenment that I do not often experience elsewhere (er, elsewhen, rather).

Such “Aha!” moments are crucial to appreciating the fun in computer science. They are commonly found whenever a deeper level of understanding is achieved by contemplating something which is not obvious at first. I have noticed that they are found more easily in Scheme than in some other programming languages, because of the flexibility that the language allows in even esoteric expressions.

In order to enjoy programming, one must appreciate the fun in programming, and it is difficult to appreciate this factor without experiencing an “Aha!” moment. The deeper the understanding, the more intense the exhilaration associated. Continuations and syntactic abstraction, in particular, are very abstruse (some may even say “arcane”) topics, especially when first encountered, and can require relatively deep understanding. Hence, by providing opportunities to learn such concepts, Scheme can provide an ideal opportunity to experience the fun of programming. Thus, the continued need for Scheme.

Not all books that use Scheme adopt this approach. An alternative approach is to structure the curriculum so that the learning becomes a linear process, rather than a series of leaps, so that all the parts fit together neatly like solving a jigsaw puzzle, rather than like climbing a mountain.

Not to be critical of this approach, but not everybody prefers it. In one critique, José Antonio Ortega Ruiz, in his blog, “A Scheme bookshelf « programming musings,” contrasts one well-known textbook that uses this alternative approach, How to Design Programs (a.k.a. “HtDP”), with SICP, as follows:

The most cited alternative to SICP is How to Design Programs by Felleisen, Findler, Flatt and Krishnamurthi. Its authors have even published a rationale, The Structure and Interpretation of the Computer Science Curriculum, on why they think SICP is not well suited to teaching programming and how their book tries to fix the problems they’ve observed. I won’t try to argue against such eminent schemers, but, frankly, my feeling is that HtDP is a far, far cry from SICP. HtDP almost made me yawn, and there’s no magic to be seen.

Ortega-Ruiz is somewhat harsh in his critique of HtDP. After all, according to the authors of the paper explaining the rationale behind the book, The Structure and Interpretation of the Computer Science Curriculum, HtDP was created in the first place to rectify two major perceived problems with SICP (page 9 of the paper):

… SICP doesn’t state how to program and how to manage the design
of a program. It leaves these things implicit and implies that students can discover a
discipline of design and programming on their own. The course presents the various
uses and roles of programming ideas with a series of examples. Some exercises then
ask students to modify this code basis, requiring students to read and study code;
others ask them to solve similar problems, which means they have to study the
construction and to change it to the best of their abilities. In short, SICP students
learn by copying and modifying code, which is barely an improvement over typical
programming text books.

SICP’s second major problem concerns its selection of examples and exercises. All
of these use complex domain knowledge….

While these topics are interesting to students who use computing in electrical
engineering and to those who already have significant experience of programming
and computing, they assume too much understanding from students who haven’t
understood programming yet and they assume too much domain knowledge from
any beginning student who needs to acquire program design skills. On the average,
beginners are not interested in mathematics and electrical engineering, and they do
not have ready access to the domain knowledge necessary for solving the domain
problems. As a result, SICP students must spend a considerable effort on the do-
main knowledge and often end up confusing domain knowledge and program design
knowledge. They may even come to the conclusion that programming is a shallow
activity and that what truly matters is an understanding of domain knowledge.

While these are all valid points, Ortega-Ruiz’s last clause, “there’s no magic to be seen,” actually describes the key conflict here. What exactly is this “magic?” To be experimentally borderline facetious, according to Arthur C. Clarke,

Any sufficiently advanced technology is indistinguishable from magic.

Arthur C. Clarke, “Profiles of The Future”, 1961 (Clarke’s third law)
English physicist & science fiction author (1917 – )

So maybe we’re actually referring to “any sufficiently advanced technology.” What do we mean by “sufficiently advanced?” I would suggest (to use a recursive definition), “sufficiently advanced to the point that deep understanding unachievable superficially is required to understand the material.”

Whether or not this “magic” is to be used in pedagogy actually relates to the fundamental design philosophy behind HtDP, as opposed to that behind SICP. To quote the above-referenced paper explaining the rationale behind HtDP, “The Structure and Interpretation of the Computer Science Curriculum,” as follows (page 11):

The recipes also introduce a new distinction into program design: structural ver-
sus generative recursion. The structural design recipes in the first half of the book
match the structure of a function to the structure of a data definition. When the
data definition happens to be self-referential, the function is recursive; when there
is a group of definitions with mutual cross-references, there is a group of function
definitions with mutual references among the functions. In contrast, generative re-
cursion concerns the generation of new problem data in the middle of the problem
solving process and the re-use of the problem solving method.

Compare insort and kwik, two standard sort functions:

;; (listof X) -> (listof X)
(define (insort l )
    [(empty? l ) empty]
        (first l )
        (insort (rest l )))]))

;; (listof X) -> (listof X)
(define (kwik l )
    [(empty? l ) empty]
      (append (kwik (larger (first l ) l ))
                  (first l )
                  (kwik (smaller (first l ) l )))]))

The first function, insort , recurs on a structural portion of the given datum, namely,
(rest l ). The second function, kwik, recurs on data that are generated by some other
functions. To design a structurally recursive function is usually a straightforward
process. To design a generative recursive function, however, almost always requires
some ad hoc insight into the process. Often this insight is derived from some mathe-
matical idea. In addition, while structurally recursive functions naturally terminate
for all inputs, a generative recursive function may diverge. htdp therefore suggests
that students add a discussion about termination to the definition of generative
recursive functions.

HtDP takes pains to remove the requirement for this “ad hoc insight” into the problem-solving process. The authors of the book then make the following claim (same page):

Distinguishing the two forms of recursion and focusing on the structural case
makes our approach scalable to the object-oriented (OO) world.

That may be so, but that contrasts sharply with the spirit of the original quotation by Alan Perlis above:

Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house.

So we have two sharply contrasting approaches: One to use Scheme for fun, and the other to use Scheme for scalability. Again, this basically is a matter of taste.

In his above-mentioned post in the above-mentioned thread, Ray contrasted the advantages of Scheme vs. Python as follows:

[S]cheme still has no standard means of managing network connections
or rendering anything on the screen. Python has these things.

Now, consider what Scheme’s got that Python doesn’t got.
It comes down to syntactic abstraction and continuations.
Courses based on SICP don’t use them, so MIT had nothing
to lose by going to Python.

SICP doesn’t use syntactic abstraction. In the first
edition, this was because Scheme didn’t have them yet.
In the current edition …. well, here’s the footnote
about macros from the current edition, page 373:

Practical Lisp systems provide a mechanism that
allows users to add new derived expressions and
specify their implementation as syntactic
transformations without modifying the evaluator.
Such a user-defined transformation is called a
/macro/. Although it is easy to add an elementary
mechanism for defining macros, the resulting
language has subtle name-conflict problems. There
has been much research on mechnaisms for macro
definitions that do not cause these difficulties.
See, for example, Kohlbecker 1986, Clinger and
Rees 1991, and Hanson 1991.

(Aside: “practical” lisp systems have them; the dialect
covered in the book does not. Students can and do draw
the obvious conclusion….)

Granted, specific implementations do have these functions. But there is no single main implementation of Scheme that everybody uses, and the libraries that implement these functions are not necessarily portable across implementations. Therefore, Scheme as a language (as opposed to a specific implementation) does not have these functions.

But is that necessarily bad? I don’t think so. I think that the whole point of Scheme, as a language, is, to quote Alan Perlis above, that we are “[not] responsible for the successful, error-free perfect use of these machines, [but] for stretching them, setting them off in new directions, and keeping fun in the house.”

To sum, when I first approached SICP, I found it too challenging to digest. I had to quit reading it repeatedly, and then return to it later, and I still have read only a portion of the book. But I became entranced with the magic of computer science as demonstrated by such creatures as tail recursion in Scheme in SICP. And it was precisely this magic that kept me returning to computer science in general, and to Scheme in particular.

On the other hand, books such as HtDP are very comforting and reassuring. While SICP sometimes makes me wonder why I am such an idiot, HtDP makes me feel as if I am no longer an idiot. I no longer need to think for hours and hours during my sleep about how to overcome a particular problem. Books such as HtDP make the material very straightforward. However, by doing so, they also remove all the magic, and break the spell.

I feel that an intermediate approach is better. The magic is necessary, but the sorcery in SICP can be too much at first. However, the jigsaw-puzzle approach of HtDP seems too straightforward. There is not enough exploration to maintain interest after a certain level of reader sophistication. Paradoxically, although I can read HtDP much, much faster than SICP, I also fall asleep reading it just as much faster, and actually haven’t read so far in it. A gentler approach than that of SICP, which still offers more exploration than HtDP, would be a better compromise.

Also, I feel that the greatest strength of Scheme lies in its flexibility for exploratory programming. Scheme shares one quality that is also shared by such addictive games as Tetris: It is relatively simple to learn, yet extremely difficult to master. Writing simple procedures to calculate such functions as the factorial function or the Fibonacci sequence is deceptively simple at first. But when the student ventures into such deeper areas as tail recursion, continuations, and syntactic abstractions, the procedures can become tantalizingly complex.

To conclude, shouldn’t Scheme really be a language for scheming programmers to figure out mainly how to have fun? I like to be a Scheming Schemer, always scheming plots for stretching the lambda abstractions to set them off in new directions, mainly just to have fun. Are you a Scheming Schemer?

Aug 26 09

Conquering the Fear of Reading Research Papers: Computer Science Research Papers for Non-Computer Scientists

by Benjamin L. Russell

Any non-mathematician, non-computer scientist layperson who has approached programming languages originally spawned in academia, such as Haskell or Scheme, has no doubt been intimidated by the academic rigor and density of many research papers on such subjects. Even such papers labeled “gentle,” as “A Gentle Introduction to Haskell” [1], can turn out to be less “gentle” than expected for those not familiar with the field.

Although I myself do have a background in computer science, as a patent translator, and not a mathematician, computer scientist, or programmer, I tend to approach such papers as more of an amateur programming language theory hobbiest and writer. Having tried to read a number of such papers, I have discovered that although many of them can be difficult to approach, some tend to be more approachable than others.

In particular, most recently, I was rather surprised to encounter a rather lengthy research paper on the history of one such programming language, Simula, which, although detailed, nevertheless turned out to be surprising approachable in not assuming advanced technical knowledge of the field (although it still required great attention to detail): “Compiling SIMULA: A Historical Study of Technological Genesis” [2].

This paper, rather than focusing solely on the technical development of the language, conducts a sociotechnical analysis of the broader historical background surrounding the Simula project. There are no formulas or even code snippets; instead, even though the paper is a research paper published in the IEEE Annals of the History of Computing, it is written as a history paper which just happens to be about the historical background of a programming language. Even more surprising, according to the endnote of the paper, the author, Jan Rune Holmevik, at the time of publication, was a graduate student in history at the University of Trondheim, Norway, and the paper itself “was written as part of [his] dissertation thesis in history Hovudfag [3] at the University of Trondheim” (page 36). In other words, this is a detailed research paper on a programming language, published in an academic journal, which does not assume any ability to program.

Until reading this paper, I had assumed that rigorous research papers on computer science published in academic journals were either written by mathematicians or computer scientists, or at least assumed a background in either mathematics or computer science to read. While this paper is definitely thoroughly researched, documented, and described, and approaches its topic in excruciating detail, it does not assume any background in either mathematics or computer science.

In other words, one need not necessarily be a mathematician or a computer scientist to write, much less read, a research paper on computer science; in fact, there are even some very thorough and detailed research papers on computer science published in academic journals which do not assume any background in either mathematics or computer science, such as this paper.

This discovery came as a revelation.

If it is possible to write a research paper without a background in either mathematics or computer science, then it must definitely also be possible to read at least some such papers. Furthermore, this is most likely not the only such research paper.

In fact, so far, I have encountered a number of computer science research papers which similarly require no or very little background in mathematics or computer science. Here is a list of some other interesting, yet approachable, research papers which (1) are either devoid of, or substantially devoid of, mathematical formulae; (2) are either devoid of, or substantially devoid of, code snippets; and (3) are either devoid of, or substantially devoid of, technical content assuming a background in mathematics or computer science:

i) “The Structure and Interpretation of the Computer Science Curriculum” [4]. By Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi.
Published in the Journal of Functional Programming in 2004, this paper discusses the motivation and design rationale for the book How to Design Programs [5] by comparision and contrast with the book Structure and Interpretation of Computer Programs [6].

ii) “Haskell: Batteries Included” [7]. By Duncan Coutts, Isaac Potoczny-Jones, and Don Stewart. Published in Proceedings of the first ACM SIGPLAN symposium on Haskell (2008).
This paper outlines the motivation for and structure of the Haskell Platform, a “Haskell for the masses” versioned packaging system of Haskell and included libraries. Although this paper does not specifically assume familiarity with mathematics or computer science, it does make use of such technical terminology as “libraries,” “packages,” “source code,” and “package description file.”

iii) “Teaching Programming Languages in a Post-Linnaean Age” [8]. By Shriram Krishnamurthi. Published in First SIGPLAN Workshop on Undergraduate Programming Language Curricula in 2008.
This paper claims that programming languages should be viewed as aggregations of features, rather than languages defined by taxonomies, and asserts that the term “paradigm” is ill-defined and should play no role in classifying programming languages. The book also addresses the issue of the split between textbooks that are “rich in mathematical rigor but low on accessibility, and those high on accessibility but lacking rigor (and, often, even wrong),” and offers an alternative.

iv) “The Early History of Smalltalk” [9]. By Alan C. Kay. Published in History of Programming Languages: The second ACM SIGPLAN conference on History of programming languages in 1993.
This paper describes the historical background behind the evolution of Smalltalk, a pure object-oriented language. It also, in part, describes the visit of Steve Jobs, Jeff Raskin, and others of then Apple, Inc. to the Xerox PARC laboratory, which led to the subsequent development of the Macintosh user interface, based on the Smalltalk user interface.

v) “Design Principles Behind Smalltalk” [10]. By Daniel H. H. Ingalls. Published in BYTE Magazine, August 1981.
This paper is a non-technical exposition of the design principles behind the Smalltalk-80 programming system. Illustrated with descriptive figures, the paper focuses not just on the programming language issues behind Smalltalk as a language of description, but also the user interface issues behind Smalltalk as a language of interaction. In particular, the paper describes how the research, in two- to four-year cycles, has paralleled the scientific method in repeatedly making an observation, formulating a theory, and making a prediction that can be tested, and summarizes key concepts as one- to two-sentence nutshell statements.

Lastly (but not leastly), the following paper, although containing a significant number of code snippets and assuming some background in computer science, is sufficiently interesting to be worthy of special mention; the first few sections of it can be safely read by a reader unfamiliar with the subject matter:

vi) “A History of Haskell: Being Lazy With Class” [11]. By Paul Hudak, John Hughes, Simon Peyton Jones, and Philip Wadler. Published in The Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III) in 2007.
This paper provides a very interesting description of the motivation and historical background of the functional programming language Haskell. In particular, the paper describes the influence of a precursor to Haskell, Miranda (pages 3 to 4); mentions how Gerry Sussman and Guy L. Steele briefly considered the idea of introducing lazy evaluation in Scheme (page 3); provides a timeline of the development of Haskell (page 7), and then proceeds on to describe the syntax and semantics (pages 11 to 28), implementations and tools (pages 28 to 35), and applications and impact (pages 35 to 46). I sometimes return back to this paper when I feel frustrated with the dryness of many other papers on Haskell, since this is one of the few papers on the language which conveys a sense of the excitement surrounding the birth and early development of the language; many other papers on Haskell tend to focus solely on technical issues, without discussing the role of human beings in the context.

As pointed out in Holmevik [2], programming languages do not “evolve in a technical or scientific vacuum” (page 35). This point is often ignored in many other papers about Haskell; luckily, it is dealt with in depth in this paper.

In my experience, becoming accustomed to reading research papers is a gradual, rather than instantaneous, process: After reading a number of such papers, one tends to become used to reading them; to recognize that failure to understand the content is not necessarily the fault of the reader, but often that of a wolly exposition that either fails to describe prerequisites to the content, or does not describe them adequately; and that the best research papers are not necessarily those that describe the most difficult content, but those that offer the most readily understandable exposition of the material to the intended target audience.

One must understand that many research papers are not necessarily written to be easy to read, but to fulfill a specific need, such as a part of a requirement for a degree, and are hence qualitatively different from most textbooks, which tend to be written so as to be easy to understand for a broader audience. Hence, it is actually quite normal for a research paper of mediocre quality to be written in such a way as to expect the reader to fill in the prequisite content, which may be assumed but not stated. (Of course, the best research papers tend to fill in any prerequisite content.)

Often, the best researchers are not the best writers; many of them cannot understand why a topic which is of trivial difficulty to them can possibly be of non-trivial difficulty for another reader. A reader aware of this fact can often approach research papers with a better plan for mastering the content therein.

Lastly, if I might add a personal expectation, I tend to enjoy reading papers that acknowledge that a programming language is an artifact resulting from a complex interplay of many different human desires, needs, and expectations surrounding its birth, and does not develop in a social vacuum. Computers do not design programming languages; people do. Therefore, discussing a programming language as if it were merely a logical extension of prior developments in syntax and semantics ignores a significant factor in its evolution. I have found that the best research papers tend to be those that, while providing a rigorous treatment of the subject material, do not assume any prerequite material not normally possessed by the intended target audience; acknowledge that some readers may not be as intelligent as the author and may find certain points that seem trivial to the author to be non-trivial; provide sufficient elucidation to cope accordingly; and discuss the human issues surrounding the design and evolution of the language.

[1] Hudak, Paul. “A Gentle Introduction to Haskell, Version 98.” New York, NY: ACM SIGPLAN Notices 27:5 (1992): 1-52. <>. An updated, free 1998 version is also available at <>.

[2] Holmevik, Jan Rune. “Compiling SIMULA: A Historical Study of Technological Genesis.” Washington, D.C.: Annals of the History of Computing 16:4 (1994): 25-36. <>.

[3] Regarding the term “Hovudfag,” Holmevik writes (page 36, footnote), “Hovudfag may be regarded as the Norwegian equivalent to a master’s degree, although it carries considerably more workload and normally takes two to three years to complete.”

[4] Felleisen, Matthias, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. “The Structure and Interpretation of the Computer Science Curriculum.” Cambridge: Journal of Functional Programming 14:4 (2004): 365-378. <>.

[5] Felleisen, Matthias, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. How to Design Programs. Cambridge, MA: The MIT Press, 2003. <>.

[6] Abelson, Harold and Gerald Jay Sussman with Julie Sussman. Structure and Interpretation of Computer Programs, Second Edition. Cambridge, MA: The MIT Press and New York: McGraw-Hill, 1996. <>.

[7] Coutts, Duncan, Isaac Potoczny-Jones, and Don Stewart. “Haskell: Batteries Included.” Victoria, BC, Canada: Proceedings of the first ACM SIGPLAN symposium on Haskell (2008): 125-126.<>.

[8] Krishnamurthi, Shriram. “Teaching Programming Languages in a Post-Linnaean Age.” Cambridge, MA: First SIGPLAN Workshop on Undergraduate Programming Language Curricula (2008): 81-83. <>.

[9] Kay, Alan C. “The Early History of Smalltalk.” Cambridge, Massachusetts: History of Programming Languages: The second ACM SIGPLAN conference on History of programming languages (1993): 69-95. <>. Also available at <>.

[10] Ingalls, Daniel H. H. “Design Principles Behind Smalltalk.” BYTE Magazine, August 1981. <>.

[11] Hudak, Paul, John Hughes, Simon Peyton Jones, and Philip Wadler. “A History of Haskell: Being Lazy With Class.” San Diego, CA: The Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III) (2007): 12-1 – 12-55, 2007. <>.

Aug 25 09

Paradigm Shift: Back to the Past, and No Small Talk About Smalltalk

by Benjamin L. Russell

Those who have been reading my posts may have noticed this trend, but there has been a decided shift in the nature of my posts starting on June 18, 2009.

Specifically, prior to this date, the majority of my posts focused on Haskell, Scheme, and category theory, with a focus on purely functional programming. While I am still interested in purely functional programming, one of my other major interests is in creating a three-dimensional virtual world with some innovative functionality (I have something specific in mind).

At first, I was intent on finding a way to create such a world using a purely functional programming language. However, most purely functional programming languages do not have enough libraries to enable easy creation of such a world. Furthermore, in order to create such a world, most purely functional programming languages would require a rather sophisticated knowledge of linear algebra, which is one area of mathematics that my visiting discrete mathematics professor in college did not adequately cover, and which I have never had enough time to study fully on my own; by contrast, my favorite areas of mathematics are all related to set theory, recursive function theory (a.k.a. “computability theory”), and philosophical logic.

Therefore, I began searching for a programming language which would enable feasible writing of a three-dimensional virtual world without requiring explicit knowledge of linear algebra. Furthermore, since I was interested in programming language theory, I wanted a programming language that was at least based on a general-purpose programming language, as opposed to a domain-specific language.

After an intermittent search that lasted several months, I eventually came across a tool called “Cobalt,” based on Croquet, further based on Squeak, a dialect of Smalltalk. Unfortunately, Smalltalk is a pure object-oriented language, not a functional programming language, and after repeated attempts at approaching Squeak, the basis for Cobalt, I found the GUI-based interface rather difficult to get used to, having come from an Emacs-based textual environment. In addition, having come from a functional programming background, I found the concept of object-oriented programming highly counter-intuitive. (Apparently, I’m not the only person who has experienced this problem; similar arguments have been advanced by Paul Hudak [1] and Jonathan A. Rees [2].)

In short, I was encountering a paradigm-shift problem (with apologies to Shriram Krishnamurthi, who claims [3] that paradigms are ill-defined and hence significantly meaningless).

Here, I was faced with a dilemma: If I tried using a functional programming language, I would probably need to do a lot of work in writing the necessary libraries for the three-dimensional manipulation of graphical objects, which would additionally require learning linear algebra, which, between my full-time translation job and my busy weekends, I simply did not have enough time to learn. On the other hand, if I tried using Squeak, then every time I tried to learn the language, I would feel uncomfortable with the object-oriented paradigm, and with the GUI-based environment, and keep returning to such familiar programming languages as Scheme and Haskell, and to such development environments as Emacs.

After some thought, I realized that the problem with learning Squeak did not have to do with any inherent difficulty in Squeak itself; rather, I needed, at least temporarily, to unlearn functional programming and unlearn working in Emacs. In short, I needed to restore a blank mental slate. Well, where did I first learn functional programming and Emacs? Ah, that’s right: in college.

Although I couldn’t actually un-attend college, I could, in a sense, restore my mental state to just before attending college: What I needed to do was to go back to the past, mentally speaking, to just before college, and approach Squeak with a fresh mind. To borrow Scheme terminology, I needed to resume the continuation in the process of my life from just before attending college: Then it would be straightforward.

One night, just after midnight on Thursday, June 18, 2009, I was walking back home, reminiscing: Let’s see … what was I doing back then. Going back in time …

2007 (changed jobs again, and became patent translator),
2006 (became patent checker, then changed jobs, and became software project leader),
2004 (moved from Manhattan to Tokyo),
2002 (political difficulties at work; job further downgraded to English teacher),
2001 (WTC disaster in Manhattan, where I lived; severe downsizing at workplace resulted; job downgraded to Materials Coordinator),
1999 (became Localization Coordinator at Berlitz),
1997 (first moved to Manhattan from New Rochelle, NY),
1996 (began first major job as Systems Engineer in White Plains; moved to New Rochelle from Jersey City),
1995 (first moved to Jersey City from New Haven),
1994 (graduated from college),
1993 (took courses in recursive function theory, philosophical logic, and the lambda calculus, especially enjoying the lambda calculus; first exposure to Haskell in auditing a course on Haskell),
1992 (finished leave of absence and self-study of discrete mathematics; took a course in axiomatic set theory),
1991 (began leave of absence and self-study of discrete mathematics),
1990 (took a course on Pascal and hated it; embarked on Computer Science major; started learning Common Lisp and Scheme: hated Common Lisp because of all the funcalls and idiosyncracies, but enjoyed Scheme because of the relative simplicity and regularity of structure of the language; started learning Emacs; learned how much I did not know, and how stupid I was, and became chronically depressed),
1989 (moved from Tokyo to New Haven, and matriculated at college).

1989. Ah, there: Continue from the continuation of my life-process at that point: the early afternoon of August 31, 1989, just before leaving for Narita Airport to go to New York to take the bus therefrom to New Haven to begin my (dreaded) college studies.

No Emacs. No Scheme. No Haskell. No category theory. No chronic depression. Return of math phobia. Return of Japanese popular music. Return of a simple mind which is not depressed because it does not know how much it does not know. Return of interest in multimedia. Aha!

Multimedia: the missing link! At that time, I was very interested in the Fujitsu FM Towns, a Japanese personal computer modeled on the Macintosh, the interface of which was based on Smalltalk [4]! Proceeding to Smalltalk from this continuation would be relatively trivial!

Sometimes, one needs to move backward in order to move forward.

So I decided to resume my continuation from this point on, with a fresh mind.

Resuming continuation….

I awoke, as if from a trance.

The next day, I returned to my computer, continuing the continuation. Suddenly, this strange text-based interface on my screen called “Emacs” seemed like a monstrosity that some text-based hacker must have concocted just for the sheer challenge of mastering arcane keystroke-combinations. Yuck! There must be a way to do programming without having to master arcane keystroke-combinations.

Let’s see; where can I find a point-click-drag interface that allows me to program without having to use a textual editor … preferably, one similar to the graphical user interface of the Fujitsu FM Towns, based on the user interface of the Macintosh….

Aha! what’s this mouse-face-icon on my desktop labelled “Squeak?” Double-clicking on the icon labelled “Squeak”….

Hmm … a colorful background with illustrations. Sound. Multimedia. Point. Click. Drag. How intuitive: just like the Macintosh interface! Hmm … some research shows that it is an implementation of a language called “Smalltalk,” the interface of which was the basis for the Macintosh … what a coincidence … how curious…. I wonder who put it here….

Hmm … found a note here. It says, “Note to myself: Learn Squeak and Cobalt, and build a virtual world using Cobalt, using ideas described on the attached sheet.” Sure; why not?

[1] Hudak, Paul. “[Haskell-cafe] a regressive view of support for imperative programming in Haskell.” Online posting. 8 Aug. 2007. 25 Aug. 2009. <news://>. Also available at <>.

[2] Rees, Jonathan A. “JAR on Object-Oriented.” Online posting. 11 May 2003. 25 Aug. 2009. <>.

[3] Krishnamurthi, Shriram. “Teaching Programming Languages in a Post-Linnaean Age”. Cambridge:_2008 SIGPLAN Workshop on Programming Language Curriculum_ (2008): 81-83. <>.

[4] Kay, Alan C. “The Early History of Smalltalk.” Cambridge, Massachusetts: _History of Programming Languages: The second ACM SIGPLAN conference on History of programming languages_ (1993): 69-95. <>. Also available at <>.