= A lesson in bad security coding = == Introduction == I was looking at an article on the Orange County Register: [code] http://www.ocregister.com/ocregister/opinion/abox/article_730292.php [/code] The article's actually pretty bad. It's by a guy from the Discovery Institute, a right-wing, religious, fascist organization that's pro-creationism/ID, anti-environment, pro-religious/state-interference, anti-assisted-suicide... just basically wrong on almost every issue. The name of the institute is rather funny as I watched the movie Name of the Rose (based on the Umberto Eco novel) recently. In it, Sean Connery's character (a Franciscan monk with a scientific bent) remarks that religious fundamentalists believe that no new knowledge can be created -- that their only purpose is to catalogue and *discover* the knowledge that "God" bestows upon his people. That's the mindset of the Intelligent Design crowd -- we won't be figuring out nature with new knowledge -- we'll just be discovering processes that were planned ahead of time -- for there's nothing truly innovative in evolution. It's such a sad worldview. In any case, I was going to comment, but I noticed a pretty bad looking captcha that would be easy to reverse. However, before putting work on it, I thought -- I should just try the url. == The URL == In this case, the captcha said "f62f7" [code] http://widgets.freedom.com/webtool_ocr/captcha.php?key=345%2525-%2525-free%2525dom-2%2529%2528%252A%255E%2526%2525%255E%2523%2524%2524%2523%2540%2521f62f7%2524%2525%255E%2526%252A%2528%2528%2524%2523%2524%2525%255E%255E%2523R%255E%252A%2526%25214d0c5d1022b5ebe83d00aa30b9dd5522 [/code] So I use a trusty perl one-liner to reduce the %HexHex strings into binary: [code] perl -pe 's/%([0-9A-Fa-f]{2})/pack("c", hex($1))/ge;' [/code] I pasted in: [code] 345%2525-%2525-free%2525dom-2%2529%2528%252A%255E%2526%2525%255E%2523%2524%2524%2523%2540%2521f62f7%2524%2525%255E%2526%252A%2528%2528%2524%2523%2524%2525%255E%255E%2523R%255E%252A%2526%25214d0c5d1022b5ebe83d00aa30b9dd5522 [/code] and I got back: [code] 345%25-%25-free%25dom-2%29%28%2A%5E%26%25%5E%23%24%24%23%40%21f62f7%24%25%5E%26%2A%28%28%24%23%24%25%5E%5E%23R%5E%2A%26%214d0c5d1022b5ebe83d00aa30b9dd5522 [/code] which looks like another urlencoded string, so I pasted that in and got: [code] 345%-%-free%dom-2)(*^&%^#$$#@!f62f7$%^&*(($#$%^^#R^*&!4d0c5d1022b5ebe83d00aa30b9dd5522 ^^^^^ look here [/code] That looks like an attempt to obfuscate the captcha string. I did it again: [code] 345%-%-free%dom-2)(*^&%^#$$#@!c9d3f$%^&*(($#$%^^#R^*&!80bdf31d5a7a52cbf6a67dd4a390d933 ^^^^^ same spot [/code] Then I noticed, you really only need the first filtering pass to see it: [code] 345%25-%25-free%25dom-2%29%28%2A%5E%26%25%5E%23%24%24%23%40%21c9d3f%24%25%5E%26%2A%28%28%24%23%24%25%5E%5E%23R%5E%2A%26%2180bdf31d5a7a52cbf6a67dd4a390d933 ^^^^^ right there [/code] But actually, if you look at the original urlencoding, you see they aren't even requiring one pass!: [code] 345%2525-%2525-free%2525dom-2%2529%2528%252A%255E%2526%2525%255E%2523%2524%2524%2523%2540%2521c9d3f%2524%2525%255E%2526%252A%2528%2528%2524%2523%2524%2525%255E%255E%2523R%255E%252A%2526%252180bdf31d5a7a52cbf6a67dd4a390d933 ^^^^^ [/code] This makes perfect sense, given that urlencoding doesn't normally operate on alphanumeric characters in any way. They probably just used the built-in php function for it. Maybe they could at least use their own to additionally obfuscate the alphanumeric characters. == The HTML Source and making your own image == Then, looking at the page source code, you can see that the image url is not generated via javascript. What happens is the server creates a string and then associates to a sessionID (located at the end of the url, after the second exclamation point). It puts the random string -- which is actually five characters of hex after the first exclamation point, in the image url along with the sessionID. To generate the image, the server just looks at the characters in the list of weird characters -- it already knows where to find them. (This is only a guess -- it could be much worse.) I just changed those characters and then resubmitted them and it sent back my image with my own characters. [code] http://widgets.freedom.com/webtool_ocr/captcha.php?key=345%2525-%2525-free%2525dom-2%2529%2528%252A%255E%2526%2525%255E%2523%2524%2524%2523%2540%2521fuck!%2524%2525%255E%2526%252A%2528%2528%2524%2523%2524%2525%255E%255E%2523R%255E%252A%2526%252180bdf31d5a7a52cbf6a67dd4a390d933 [/code] I then did a binary search to figure out what characters could be removed, and got it down to this: [code] http://widgets.freedom.com/webtool_ocr/captcha.php?key=faker%2524%2525%255E%2526%252A%2528%2528%2524%2523%2524%2525%255E%255E%2523R%255E%252A%2526%2521 [/code] which is this, deencoded twice: [code] faker$%^&*(($#$%^^#R^*&! [/code] They just have some "magic" in there. Change the first five chars to whatever you want. == The JavaScript and client-side validation == Then I did some more looking: [code] wget -O - --referer='http://www.ocregister.com/ocregister/opinion/abox/article_730292.php' 'http://widgets.freedom.com/webtool_ocr/js/webtool_comment.js' 2> /dev/null [/code] I was going to look at that code, but then I noticed in the source code for the page it says: [code] var gTheWord = '6824b'; [/code] The captcha's image text is loaded in the page already, very clearly visible! After I stopped laughing, I looked at the javascript. I have to see if they are doing client-side validation. [code] if(frm.password.value != gTheWord) { alert("Please enter the correct password."); frm.password.focus(); return; } [/code] YEP! The bad-character filter for the name, 1900 character limit, and the profanity filter are client-side as well. They probably don't even filter them out at the server. I'll leave those for the reader. == Conclusion == Even if they did check the captcha on the server, the fact that it's checked by the client too makes the algorithm for figuring out the captcha automatically a trivial operation -- otherwise the client couldn't do it. That's on top of the fact that the captcha is derived from a poorly obfuscated double-urlencoded string that actually does no real obfuscating.