Of course, one doesn't often change the grammar...
The problem is, it's really bad, easy, error handling.
It's generally thought that using YACC makes it much easier and faster to write parsers. I would submit that no one who says this seriously has ever done larger parsers both ways. In the old days, when YACC was written, it was uncommon for programmers to produce many thousands of lines of code on a short schedule. If you could do that, you were a wizard. But today, everyone really ought to be able to do that. (What's different now? Beats me .. that would be a good essay subject for old-timers though.)
Anyway, writing code isn't actually hard, it's writing hard code that's hard. It's debugging difficult problems with limited visibility that's hard. RD parsers are easy. Dealing with YACC wackiness and the inevitable shift/reduce conflicts is hard. LALR(1) isn't a terribly powerful grammar and the disambiguating rules actually differ between YACC versions and platforms. (Yes, the same YACC grammar might work on system A and not on system B.) Bison even has a kludge statement where you specify the number of shift/reduce conflicts you expect. If the number comes out different, you get an error return. Older systems would often print in the Makefile the number of shift/reduce conflicts that were considered normal. Gee, if I change the grammar and get more, or fewer, do I just change the expect number?
What typically happens is that the initial YACC grammar is quickly banged out, but then the project bogs down in grammar conflicts and wierd bugs where, for no obvious reason, the system isn't doing what you want. In an RD parser, it's essentially impossible to get stuck on a parse problem; a little bit more code and arbitrarily conflicting grammars are easily parsed. Furthermore, you will always get exactly what you ask for .. but in YACC, the generated parser decides.
In fact, you could even write the fixed program back, but no one does this, perhaps because all the parsers are now written in YACC or a YACC-descendent.
Many people in the circles I frequent simply assume that a standard must have had all the right inputs and must be much better than what any real-world computer manufacturer would have come up with on their own. I disagree, and here is why.
At the time - 20 years ago - all of the world's scientific computing had been done on Cray, VAX, IBM, and CDC computers. Of these companies only the manufacturer of the VAX sent a committee member.
There is a perception that only work done recently in computers matters. This may be true in systems -- the old kernels and user environments were awful, except for unix -- but it's not true in the number-crunching racket. There, most of today's codes actually date back to the older eras.
Quite a bit of advanced work had been done in aerodynamics, finite-element-analysis, computational fluid dynamics, and many other fields. The state of the art in floating point support was not in any way primitive, yet none of those companies, except DEC, sent a committee member. The standard only applied to microprocessors, and at the time, they were useless for this kind of work.
The one committee member from DEC led the opposition to the standard.
But in fact, computerized floating point arithmetic is no more about internally consistent theoretical mathematical systems than a line on a drafting board is a mathematical line. Both are simply tools used to solve problems, and the line on the paper is only in a vague way related to a line in algebra.
If the line hits the edge of the paper, it is acceptable for the engineer to start over with a different scale factor. Likewise, when computerized arithmetic overflows the supported number formats it is acceptable for the program to trap and reject the attempted operation.
Very little can be accomplished by stretching the piece of paper. Likewise, little is gained by adding the small band aids that IEEE 754 adds to computerized arithmetic. It remains only a distant approximation to any abstract real number system and all of the IEEE 754 features change that in only small ways.
Here is how they compared on multiply:
70 uS | Intel, full IEEE 754 support in HW |
5 uS | NSC, no IEEE 754 support |
Example two, about 17 years later...
Now almost all computing is done on microprocessor based systems. Intel is still the only system that implements 754 in hardware. The DEC Alpha is a very 754-hostile microprocessor that, like the National NS32 17 years earlier, implements virtually none of 754 in hardware. (Both claim 754 compatibility because they use the same exponent widths and because tons of kernel trap handling software can in theory handle all of the traps, but for the most part neither chip is ever used in that mode other than to demonstrate conformance, and I'm not aware that the NS32 has ever done that, even if it was theoretically possible.)
Here is how they compared on SpecFP95:
9 | Intel, Pentium II, IEEE 754 compatible |
50 | DEC Alpha 21264, IEEE 754 hostile |
Today, Intel has broken into double digits and claims some results in the 15-30 range, but the alpha is now into triple digits.
And here is a telling example...
I tried to say something good about IEEE 754 Floating Point here, but I realized that it is actually the worst item of all.
Unfortunately, the net result of this new behavior in practice is something much darker than a useful new program organization and a more convenient error checking paradigm. No, instead what happened -- and with 20-20 hindsight this seems inevitable -- is that people aren't finding the stupid divide-by-zero bugs any more.
Is this really a big problem? It's bigger than you might think. There are 2,000 programs in The NetBSD Packages Collection. Many of these programs use at least a little bit of floating point, often for noncritical functions such as statistics or image conversion.
Almost every last one of these is developed or is maintained on the PeeCee. Consequently, it isn't very important for those developers to fix the stupid math bugs, especially if it just affects the statistics in cycle zero, or the color chosen when the intensity bits are zero, or whatever. The problem here is that the breakage just isn't serious enough, when you do have error-ignoring IEEE 754 floating point.
As a result at any given time many of these packages are broken on a system that doesn't implement the IEEE 754 floating point standard. When run on such a system, the program traps and stops. That the trapping routine lacked any importance in the active case is now of no help. While one can argue that it's just great that 754 enabled the buggy program to run, before 754 this kind of error was immediately caught and corrected.
So I conclude that 754 is a virus, infecting individual programs, and making them unable to run on non-IEEE-754 hardware.
Why is there bad code in the first place? The problem, I think, is that few people even attempt to understand the actual C standards. Many standards are badly done, as in fact the previous essay on this page complains. But C89 and C99 are well written standards. (Another nicely done standard is PCI.)
Not only do people not know what conforming code is, they don't know why the rules are there in the first place. Interestingly enough, conforming code is always portable code. That's one of the reasons for the rules. The most common confusion seems to be between what C allows you to do with expressions, and what C requires you to do with data. Because the compilers are required to accept all C expressions, they must compile programs even when those programs violate the rules on using data. Believe it or not: all C standards require you to dereference pointers and access data in the data's original type or as char *. Type punning is simply prohibited, except when the new type is char *.
You are allowed to legally change the type of pointers, you can move pointers around using memcpy(3), you store pointers in scalars, you can write them to files, you can move then around as void *, you can do arithmetic on them and even make them up in scalar expressions. But when you finally dereference it, it has to point to a thing of its type, except for char *, which is exempted.
That one rule means that the compiler has control of all data alignment. Imagine a data file or network protocol that streams together many objects of disparate types. The obvious and wrong programming approach involves casting a pointer into a buffer to first one type and then another in order to process the stream. This breaks all sorts of rules and is nonportable, yet it's what most people do.
The alternative almost no one uses, but which is about the only conforming approach, is to declare objects of the various types and then memcpy(3) sections of the stream into these objects before accessing them. I don't care whether you like that approach or not, or whether you think it's ugly. It's the only conforming way, and anyway memcpy(3) is a built-in almost everywhere now. Hint: make the memcpy(3) destination a union and your code may even look beautiful, appear to play all kinds of games with types, but in fact be perfectly conforming and portable.
You can extrapolate from this into a way to discriminate between breaking rules foolishly and breaking rules reasonably. It doesn't make sense for me to drastically exceed the speed limit when I drive into work over California 154, a narrow, twisting mountain road. Although I work at sea level in California, there is occasionally snow on top of San Marcus pass, the route of Ca 154. On the other hand, if I take the slightly longer way around on US 101, a nice, straight multi-lane US Interstate traversing the much lower Gaviota pass, I can go 100 MPH or more without risking running off the road.
Both approaches break the speed limit and are prohibited, but the second still makes far more sense than the first. If you want to change the type of something for a good reason: go ahead and break the rules, C almost comes out and specifies that even programs which break the rules must compile. (To restate the observation above: the compiler would have to prohibit a legal expression in order to prohibit an illegal data conversion, so it can't realistically happen.) But break the rules by using memcpy(3) or char *, and in such a way that when you do finally dereference or otherwise use the data, it is in an object of the appropriate type, allocated somewhere by the compiler with the correct alignment, and make sure the bits got there via memcpy(3) or some char * manipulation.
What's really sad is: it seems to add at least as many bugs
as it fixes. It may be that it adds way more bugs than it fixes:
we have no way of knowing. Perhaps it fixes more than it inserts.
I doubt it.
The problem is that, despite the protests of the sweepers, they end up making the following priority decision: having untested but linted code is more important than having tested but unlinted code.
Now, if you actually were to put that in words and ask them to admit it, no one would agree. But when they commit untested code, in the rare cases that someone complains, the complainer is invariably chastised for causing trouble, and he is assured that I tested it as well as I could.
The philosophy seems to be: yes, it is critical to test. But in this case, I didn't have that device, so I couldn't, or I didn't have any way to get the program to execute that case.
What these excuses really boil down to is: it compiled on i386. But the way it is usually phrased on the rare cases where someone dares to question the bad decisions is: i tested it as well as i could.
This is the king all of all phony excuses. Let's analyze this. What is being said is:
In other words: a priority ordering was in fact made that concluded that having untested linted code was more important than having tested unlinted code. The priorities were ordered (A) Linted, tested code if easily tested. (B) Linted, untested code if testing was hard. (C) Linted, tested code was the lowest priority, because it was harder to arrange for, and so it didn't happen.
Do I really have to spell out how stupid this is?
Well, maybe I do! The most completely innocuous changes can and will cause failures from time to time, usually by uncovering a separate, latent bug. Even if you do know your change is correct by inspection, that's not good enough.
I think one problem with sweeping through the system editing files is that people see certain expert developers do that for particular reasons and feel that as a general practice it is then OK.
When I tried out the arguments from this essay on someone recently, he said ``well, Joe Turbodeveloper does that too, when he changes kernel interfaces on every port''.
My response to that was:
Invariably, a contract (say, a license) is a “memorandum of the understanding”. That is, it is the written record of the actual understanding between the parties, and not the actual definition of the understanding.
Most people don't understand this, they think that only the written words are important, and that they represent the “real” or “core” agreement. Software types are particularly prone to misunderstand this, because they are used to dealing with invariant technicalities. It certainly makes no difference what you intended a program to do.
But with law, it's the opposite. For example, if parties A and B make an agreement, and party A drafts the contract, and inserts into the middle of this contract a clause that effectively reverses the deal, that clause will be thrown out by every court in most countries, and certainly by every court in the countries I listed. In fact, one principle of law is that they interpret ambiguities and contradictions in favor of the party that did not draft the contract. This is a consequence of the fact that the contract itself is just the written record of the real intention and understanding. It is the understanding between the parties that is what's important, not the actual written words of the contract or license.
It's still important to get the wording right, of course. For one thing, everyone generally agrees on what the written words are. For another, contracts stated in writing, while no more valid in principle than oral contracts, are not as easily disavowed or creatively reinterpreted. The parties with well-written contracts tend to avoid needing to appear in court at all. Sometimes, the only thing you have to work with is the written words. But it's important to never forget that the contract itself is just the written record of the true agreement.
So, when people who read a license or contract ask “what did you intend?”, they are really asking a reasonable question .. i.e., what is it you want? One can't refer them to one's own agreement for clarification, since it is the understanding of the parties to the contract that is the primary source of authority for the details.
Although it's very good practice to get the wording precise and correct, the agreement is the understanding, not the technicalities of the wording.
Example: This is a case in statutory, criminal law, and not common law, but it illustrates the way courts apply rules.
Several years ago, OpenBSD moved crypto code from the USA to Canada, then felt they could export it freely. They thought the technicalities that “first hop is legal”, “second hop is not covered by US law” would protect them.
Actually, that's not true at all, many people have been put in jail for violating US export restrictions via that exact step .. USA to Canada, Canada to elsewhere. An interesting case of exactly this was that of master artillery engineer Gerald Bull, where the export went USA .. Canada .. South Africa. Gerald Bull spent a year or two in club fed for that.
Anyway, technicalities are no defense. The court just said “started in the US, ended in South Africa, guilty.” It was only the intention of the exporter (and of the reg) that mattered. In the case of software exports, the courts were very suspicious of the BXA regs, djb beat them with his EFF-backed defense, and no one at OpenBSD ever paid for their wrongdoing. But it was clearly illegal, nonetheless. That doesn't mean you shouldn't do it, I see no moral or ethical conflict per se with defying a State, but usually other people are involved so there are ethical issues with exposing them to prosecution.
It's important to realize that OpenBSD crypto exporters within reach of USA courts could easily have been convicted if anyone cared and if the regs themselves had withstood the constitutional test. That is, they won simply because (1) they were under the radar, no one cared about a few BSD exports because the regs were written to stop Microsoft from equipping every Windows PC with encryption; (2) the regs were unconstitutional and (3) they were ultimately withdrawn. But it had absolutely nothing to do with any technicalities the BXA regs may have had in their application.
In fact, the entire popular picture of defendants being let off on
technicalities by liberal judges is entirely false. In most jurisdictions,
well over 90% and often even over 95% of all defendants are convicted.
Basically, you go to court, you get convicted. It's best, it would
seem, to stay out of court.
|
|