All Unkept

Irregular Expressions

Posted in: Rants, Microsoft  — 

I've been using Microsoft Visual Studio 2003 at work for development work, and it has mainly been very good. I discovered today that it's find and replace function includes support for Regular Expressions. "Awesome", I thought -- and anyone who knows about Regular Expression (reg exps for short) will know why. Regular expressions are essentially a very powerful syntax--almost a language really--for doing pattern matching and replacing. The task I needed to do today was remove all whitespace (spaces and tabs) from the beginning of every line in a text file, but nowhere else. A regular expression search and replace makes this trivially easy -- just s/^\s*//g as a Perl command, for instance.

The \s bit wasn't working so I looked at the help and then came the shock: Microsoft's idea of regular expressions are initially very similar to real reg exps, but very different. There was agreement on the meaning of . * ? ^ $ [ ] and a few of the escape sequences (\n and a few others) but after that things went quickly astray. While Perl, sed, Javascript, PHP and probably every other language have one syntax for regexps, with agreement on all the essentials, and backwards compatibility in general for the rest, Microsoft have decided to introduce a different syntax and call that Regular Expressions.

My question is this: why? why? WHY? As if the computing world has not been scarred enough already by Microsoft design decisions! Perhaps Microsoft's dislike for standards is now so ingrained that they just can't help it -- they simply have to create things that go part way with a standard, and then deviate, even when it makes no sense at all. It's not like they can't follow it - they've written correct regular expression libraries for javascript and vbscript and probably lots of others. But the IDE makes you learn a different syntax just to confuse you.

OK, rant over.

Comments §

blog comments powered by Disqus