Thursday, May 22, 2014

When Quotation Marks Are Not Quotation Marks

Today at the end of an extra long work day, I caught up with a colleague who, being new to the system we are working on, was stuck trying to get something to work. So I joined in to see what I could see. The particular task we were looking at, while relatively easy, can be a bit tricky due to setup and input sensitivity. First, the "check the wires" test: make sure all the input files are in the right places. Check. Second, make sure the environment is pointed to the right place. Check. Third, hunt for typos. Nothing immediately visible. OK, all the easy things look covered, break down the problem and test individual pieces. Doing that narrowed it down to one line of input. A line that was almost identical to another working one and had nothing visible wrong with it. And this is where my reason shut down an my experience kicked in.

Rather than scratching my head over why the line wasn't working, I copied the line that was and made the few minor alterations that made it look identical to the non-functioning line and removed the non-functioning line. Run the test again, and viola it worked.

*Sigh* And this is the victory and defeat of a programmer summed up. I immediately got a surge of pride, that yes, I have learned things in my thirteen years as a professional bit-slinger. And then came the wry smile, because most of what I have learned falls into the category of expecting the stupidest, most impossible things to be pretty common, actually.

So what was going on with my co-worker's problem? Programs are very sensitive to changes in their data. Computers don't understand intent, they just see numbers. And sometimes the data has numbers in it you can't see. Programmers favor particular kinds of fonts for this very reason. My personal test for a good font goes like this: is it mono-space (every character takes up the same amount of space, so things line up both vertically and horizontally, giving essentially a grid of text) and can you tell the differences between these characters in a quick glance: Il1| and oO0Q. A good font helps with these tiny errors, but there are more lurking. White-space characters like tabs and spaces may look the same in an editor, but cause a program to behave differently. (I'm looking at you make.) Control characters and other "non-printable" characters, such as carriage return, line feed, null, bell, and a host of others may be hiding in the data, completely invisible to normal viewing.

My current favorite is one that the web and related technologies have made more and more common: when is a quotation mark not a quotation mark? See if you can spot this one side-by-side: " vs. ”. Seems like the same thing, doesn't it. Yeah, one is fancier looking than the other, but they are both quotation marks. Just not the same quotation mark. The one on the left is known in HTML as " and  it corresponds to the number 34 in ASCII code. The one on the right in HTML is ” and it isn't in the standard ASCII code at all. So if the program is looking for one, but the other is used instead, whoops, it doesn't work.

No comments: