Skip to main content

How to [not] get fired

Marco already posted on this, but I thought I'd pitch in our side of the story.

At php|works, this year (a couple weeks ago), our /fear(some|ful)/ leader was absent. He had some personal stuff that conflicted with the conference's schedule, so he left it in our (Paul, Arbi and myself) mostly-capable hands.

I think we did a good job, even without him, but to deter him for deserting us at our [bigger!] spring conference, we came up with an idea... a good idea (-:

Tradition states that Marco should give the closing keynote at our conferences. This time around, we had excellent internet connectivity–thanks to the nice folks at OneRing Networks and a little experience (read "don't trust the hotel's AV company for networking needs) on our part–we (Marco and I) decided that he could reliably give his keynote via iChat. The idea had not yet been conceived...

We try very hard to make our conferences professional but not uptight. Some ideas works, some don't. This one did (-:

We devised a way that we could harness the audience to "caption" Marco's closing keynote. It involved 15 minutes of coding, 2 laptops, 2 projectors and an unsuspecting boss.

The first projector displayed the output from my laptop's iChat window, so everyone could see Marco's lovely mug. The 2nd projector ran a web browser that displayed the audience-sourced caption (updated every 5 seconds).

Oh, and as I alluded, he had absolutely no idea we'd done it until a few days later when I let the cat out of the bag–I figured it best that he hear it from one of us than read about it on someone else's blog.

We had fun, and I distinctly remember hearing "Best Closing Keynote EVER" after we logged off. Hope you had fun, too (-:

Handling Downtime: Job Well Done

Yesterday, if you tried to call me (either at work, or at home, or even my mobile), you were probably unable to reach me.

The telephone (VOIP) wholesaler I use, Unlimitel, was down for around 9 hours. Normally, 9h of downtime would really bother me, but I can honestly say I've never been happier with their service.

Here are a few reasons why:

  • This is the first significant downtime we've had in 2 years (since I started using Unlimitel)
  • We pay 1.1¢/min for on-net service (local calls in a number of Canadian cities) and 2¢ for the rest of North America. Low rates for most of the rest of the world, too. $2.50/month for our DIDs. I don't expect completely flawless service for this price.
  • Stephan, the president, emailed customers to explain the problem. The first mail was before we even noticed that calls weren't working.
  • The actual cause of the accident was a screwup at Rogers (NOT Unlimitel's fault):
    • A truck accident somehow caused a bundle of fiber at Rogers to be cut. (Initial reports were that the lines were cut by a backhoe.)
    • Rogers' redundancy somehow failed. The actual cut was 25KM away from Unlimitel's datacenter.
  • Throughout the downtime, Stephan kept us well-informed of the situation, relaying ETA data from Rogers whenever possible.
  • Unlimitel routed all possible traffic to non-Rogers circuits as soon as possible. (Outbound calls started working, but inbound lines are on Rogers, and Rogers' redundancy failed, as mentioned.)
  • The few times that Unlimitel has actually made mistakes, they've kept us well-informed, and owned up to these mistakes quickly (they implemented CallerID poorly, a while back, and quickly fixed it, for example).

All told, I'm very happy with them. I can understand why some people would be upset over ~9h of unplanned downtime, but all things considered, I think Unlimitel did an excellent job of handling the crisis.

For what we pay, I couldn't expect better. Kudos to the team over at Unlmitel.

Essential PHP Security

Quite a while ago, O'Reilly sent me a copy of my friend and colleague, Chris Shiflett's book, Essential PHP Security.

When I received it, I read through it quickly, and knew it was a good book, but didn't have much else to say about it (lest I join the ranks of the me too!ers (everyone was saying it's a good book)).

Today, I was wondering about session ID regeneration. I know it's important, but I was looking for a "best practice," or opinion on an appropriate level of session ID regeneration.

After a few quick Web searches, I remembered that I have a copy of the aforementioned book. I respect Chris' opinion on such matters, so I pulled it out of my pile.

A glance at the index shows:

session identifier
obtaining, 43
regenerating at session,  46
regenerating for change in privilege, 46
regenerating on every page, 47

Turns out page 47 contains exactly what I was looking for. It's too long to quote here, but the gist is Regenerate only on privilege escalation, not on every page. Every page works for the most part, but causes problems with the back/forward buttons, and needlessly annoys users.

Thanks, Chris!

PHP Pie?

I've often had to manipulate large blobs of text—no, make that many files containing large blobs of text.

Of course, my IDE can usually handle simple search-and-replace operations, I appreciate the simplicity of the command line interface, on most occasions.

That's one of the reasons I love working in a unixy environment, I think. There's a bunch of utilities that embrace the command line and take simple input and deliver equally simple output. I've employed sed and awk, in the past, and I still use them to perform some very simple parsing. For example, I can often be found doing something like ps auxwww | grep ssh | awk {'print $2'} to get a list of ssh process IDs, for example.

But almost anyone who's ever been enlightened to perl pie delights in its power. In a nutshell, I can do something like perl -p -i -e 's/foo/oof/g' somefile from the command line, and perl will digest every line of somefile and perform the substitution. Perl is very well suited to this type of operation, what with its contextual variables and all.

I updated the code a little, below. You now must explicitly set $_.

Read on for my PHP-based solution (lest planet-php truncate my post). I've often found myself looking for a PHP equivalent. Not to do simple substitutions, of course, but complex ones. And since I'm most comfortable with PHP, and a I have a huge library of snippets that I can dig out to quell a problem that I may have solved years ago, I've been meaning to fill this void for a while.

Tonight, I had to come home from a dinner party, early, because my daughter was sick. Too bad, it looked like it was going to be an amazing feast, but I digress. The home-on-a-Saturday-night time left me with a bit of free time to solve one of the problems that's been floating around in my head for who-knows-how-long.

Thus, I'm happy to present my—at least mostly—working PHP pie script.

#!/usr/bin/php
<?php

// Change the shebang line above to point at your actual PHP interpreter

$interpreter = array_shift($_SERVER['argv']);
$script = array_shift($_SERVER['argv']);
$files = array_filter($_SERVER['argv']);

if (!$script) {
	fwrite(STDERR, "Usage: $interpreter <script> [files]\n");
	fwrite(STDERR, "  Iterates script over every line of every file.\n");
	fwrite(STDERR, "  \$_ contains data from the current line.\n");
	fwrite(STDERR, "  If files are not provided, STDIN/STDOUT will be used.\n");
	fwrite(STDERR, "\n");
	fwrite(STDERR, "  Example: ./pie.php '$_ = preg_replace(\"/foo/\",\"oof\",\$_);' testfile\n");
	fwrite(STDERR, "    Replaces every instance of 'foo' with 'oof' in testfile\n");
	fwrite(STDERR, "\n");
	exit(1);
}

// set up function
$func = create_function('$_', $script .';return $_;');

if (!$files) {
	// no files, use STDIN
	$buf = '';
	while (!feof(STDIN)) {
		$buf .= $func(fgets(STDIN));
	}
	echo $buf;
} else {
	foreach ($files as $f) {
		
		if (!is_dir($f) or !is_writable($f)) {
			fwrite(STDERR, "Can't write to $f (or it's not a file)\n");
			continue;
		}
		
		$buf = '';
		foreach (file($f) as $l) {
			$buf .= $func($l);
		}
		file_put_contents($f, $buf);
	}
}

?>

Hope it helps someone out there.

Update: I've had some people ask me why I'm reinventing the wheel. I did cover this above—I have plenty of existing PHP code snippets, and almost no perl. I also am very comfortable in PHP, but it's been years since I've been comfortable in perl.

Here's an example of something I hacked up, today. I can (relatively) easily turn this:

dmesg | tail -n5

... which returns this:

[17214721.004000] sdc: assuming drive cache: write through
[17214721.004000]  sdc: sdc1
[17214721.024000] sd 7:0:0:0: Attached scsi disk sdc
[17214721.024000] sd 7:0:0:0: Attached scsi generic sg1 type 0
[17214722.464000] FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!

(the first field is the time since boot... useless for my feeble human brain)

into:

dmesg | ./pie.php 'static $prev = false; static $boot = false; if (!$boot) {
list($boot) = explode(" ", file_get_contents("/proc/uptime"));
$boot = time() - (int) $boot;} if (!$_) return; list($ts, $log) = explode(" ", $_, 2);
$ts = str_replace(array("[","]"), array("",""), $ts); $_ = date("H:i:s", $boot + $ts);
if ($prev && ($diff = round($boot + $ts - $prev, 2))) $_ .= " (+". $diff .")"; 
$_ .= " ".$log; $prev = $boot + $ts;' | tail -n 5

(line breaks added for easier reading)... which returns:

17:07:44 sdc: assuming drive cache: write through
17:07:44  sdc: sdc1
17:07:44 (+0.02) sd 7:0:0:0: Attached scsi disk sdc
17:07:44 sd 7:0:0:0: Attached scsi generic sg1 type 0
17:07:45 (+1.44) FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!

That's the sort of thing I wouldn't be comfortable doing in perl, but I hacked up on the command line in PHP.

You can, but you shouldn't

"Here's the mascot," he said, leaning over one of my two half-walls, handing me a file of papers, "the production guys will get you the artwork. The jokes are at the back. Call me when it's ready."

It's early 2000. I'm slaving away in my pseudo-cube in my hometown. I'm a script monkey. My job consists of writing minimal CFML (oh yeah, Coldfusion) wrappers around boring products, like fish hooks.

Denis, the cube-leaning account manager, had tasked me with a project that was mostly impossible (at the time), but moreover, it was a project that simply shouldn't have been done.

The pitch was delivered earlier in the day. "It'll be great! The user will be browsing the website, and the dog (the dalmation mascot) will walk onto the screen and tell a joke!"

Now, remember, this is 2000--at the end of the first browser war; the peak of browser non-compliance; a time when developers were using IE as their primary browser and cringed when forced to test code in Netscape (v4) (as opposed to today, when many developers are using a Mozilla-based browser (Netscape's evolved grandson) and are disgusted by the thought of testing on IE).

Technically speaking, we probably could have rigged up a solution that might have worked on most IE installs, but this gave us a convenient excuse to overthrow the marketing fools: the idea was horrible. Our answer was that there was no technology that would allow us to implement the absurd joke-telling-mascot idea.

Sometimes you can, but you shouldn't. This proverb leads nicely into one of my latest web annoyances.

Like any good technically-minded person, I hold more than the average share of pet peeves. Many are related to software, many more to computing in general. A few are directly related to my area of expertise: the Web.

So, I ask you, my fellow Web developers: WHY do you find it necessary to create your own widgets, when there's a good (albeit limited) toolkit available? Stop it. It drives me crazy.

Want examples? Here you go:

Note from future Sean: these were embedded html objects that died years ago. I'm sure you can imagine horrific scroll bars invented by people who don't really understand how scroll bars work.

Admittedly, the stock HTML widgets might not be as pretty as these custom ones, but they WORK, they're consistent from site-to-site, and you don't have to worry about javascript bugs.

For example, on the select boxes, above (digg.com and saq.com), I can't click the box and press the first letter of my suggestion (like I can with real HTML select boxes). The radio buttons don't honour keyboard input, either -- I can't use the arrow keys to advance. Generally, these hacked-together widgets don't respect the tab key, either.

And if you're using a text based browser (for whatever reason), or perhaps screen reading software, you're pretty much out of luck.

You wouldn't draw each pixel in a line of text would you? Of course not! (unless you're this guy).

In short: your pretty site is no nicer than the joke telling dalmation. Cut it out.

Security and... Driving? (and Hiring)

There's been a blip on the PHP blogosphere (think what you will of that word, it's accurate) regarding PHP's "inherent security flaws."

I guess it's time to toss in my 2c (even though I was one of the first to reply to Chris' post on this). Since I like similes, I propose the following: coding is like driving.

What? It's pretty simple, if you think about it.

If you drive, you'll follow. If you don't, but have tried, you'll also follow. If you've never tried it, you should. (-:

Coding is like driving. When you start driving, you're really bad at it. Everyone is horrible, even if they aren't aware.

As time passes, and you gain more experience behind the wheel, you're subjected to different driving conditions and new hazardous situations. These eventually make most of us better drivers.

Take me, for example. I grew up in a relatively small city in New Brunswick. I learned to drive there. At the time, there was very little street parking, and as a result, very little parallel parking. I was really bad at parallel parking for a long time. I first started driving when I was 16. It wasn't until I was 20 that some friends and I took my car to the first (and only?) Geek Pride Festival. Closing in on Boston, the roads got wider and wider. Suddenly, I found myself driving on a road that was 4 lanes in each direction. You laugh, but this is daunting for a guy who'd never driven on anything wider than 2 lanes (in each direction), before. I knew to cruise on the right, and pass on the left, but... how do I use those other two lanes? I now live in Montreal, and feel confined when there are only two lanes. (-:

Another parallel is when I learned to drive stick (manual transmission). My first few weeks were quite jumpy... then, my clutch foot smoothed out, and my passengers were relieved.

More food for thought lies in the insurace industry. Now, I'll keep my feelings towards these racketeering slimeballs (mostly) to myself for the purposes of this entry, but they DO do something right: reward experienced drivers (often at the cost of young males, but I digress).

I have a motorcycle license. I had to pass both written and driven tests to be able to ride. Even then, I only qualified for the lower class of bike ( 550cc).

Alright, so what's my point? Simple: new coders are bad at their jobs. I thought I was good at the time, but I was horrible. I'm better now, but in 2 years, I know I'll look back at this and think about how bad I was 2 years ago. New drivers are also bad.

So, the people who control the roads have put a few safeguards into effect to keep these people from hurting others. First, there's graduated licensing in many parts of the world. When I was 16, I had a 12 month waiting period before I could drive by myself, and even then, I had to maintain a 0.00% blood alcohol level whenever driving.

Insurance companies penalize (or, if you're fluent in marketing, "don't reward") new drivers. My insurance payments are now an order of magnitude lower than when I first started driving.

Trucking companies are likely to hire newgrad drivers, but this is because their workforce is scarce. They put their better, and more experienced drivers on the most complicated routes. And most taxi drivers I see are well over 30.

Getting offtopic again: New coders are bad. They learn. Some quickly, some not so much. They make mistakes.

So, how do you get around this? Two ways. If you run a small shop, you should ONLY have experienced developers on staff. If your shop is a little bigger, then you can afford (ironically) to pay less to inexperienced devs that can do some grunt work, and get a bit of experience under their belts. Make sure that your good devs are reviewing their work, though.

You're effectively enforcing "graduated licensing" on your devs. If they have little experience, give them little power.

That said, I firmly believe (and agree with Marco) that it's not PHP's job to enforce this. Just as I would not expect Plymouth to limit my ability to drive my old Reliant K car. There are rules in place at a higher level, and that's GOOD in my opinion.

PHP is easy, or at least it starts out that way, and then, after a certain threshold, gets more and more complicated, but that's OK. Everything works this way. "Windows" is easy.. but when your registry pukes, it takes guru skills to clean it up (or novice skills to find your XP CD to reinstall). Driving is "easy"... just don't put new drivers in a situation they haven't seen before (whiteout/blizzard, collision, black ice, blinding sun, etc).

The money you save by hiring new grads (without proper mentors/filtering/etc) is often trumped by your exposure to security flaws, bad design, and failure.

A little aside: development shops and otherwise-hiring companies seem to be catching on to this. In the past 3 months, I've had 4 colleagues (former) come to me asking if I know any advanced PHP devs in Montreal who are looking for work... I've made a few suggestions, but most of the GOOD locals I know are already happily employed. If you live here (or are planning on moving here), and you've got LOTS of PHP experience (more than 3 years), have diverse experience, and are genuinely a good coder, let me know, and I'll try to hook you up.

($var == TRUE) or (TRUE == $var)?

Interesting little trick I picked up a while back, been meaning to blog about it.

Prior to enlightenment, I used to write conditionals something like this:

if ($var == SOME_CONSTANT_CONDITION) {
  // do something
  }

... more specifically:

if ($var == TRUE) {
  // do the true thing
}

That's how I'd "say" it, so that's how I wrote it. But is it the best
way? I now don't think so. When reviewing other peoples' code (often from
C programmers), I've seen "backwards" conditionals.. something like:

```php

if (TRUE == $var) {
  // ...
}

Which just sounds weird. Why would you compare a constant to a variable (you'd normally compare a variable to a constant).

So, what's the big deal?

Well, a few months back, I stumbled on an old article about a backdoor almost sneaking into Linux.

Here's the almost-break:

if ((options == (__WCLONE|__WALL)) &amp;&amp; (current-&gt;uid = 0))
  retval = -EINVAL;

Ignore the constants, I don't know what they mean either. The interesting part is current->uid = 0

See, unless you had your eyes peeled, here, it might look like you're trying to ensure that current->uid is equal to 0 (uid 0 = root on Linux). So, if options blah blah, AND the user is root, then do something.

But wait. There's only a single equals sign. The comparison is "==". "=" is for assignment!

Fortunately, someone with good eyes noticed, and Linux is safe (if this had made it into a release, it would've been trivial to escalate your privileges to the root level).. but how many times have you had this happen to you? I'm guilty of accidentally using "=" when I mean "==". And it's hard to track down this bug.. it doesn't LOOK wrong, and the syntax is right, so...

This is nothing new. Everyone knows the = vs == problem. Everyone is over it (most of the time). But how can we reduce this problem?

A simple coding style adjustment can help enormously here.

Consider changing "$var == TRUE" to "TRUE == $var".

Why? Simple:

sean@iconoclast:~$ php -r '$a = 0; if (FALSE = $a) $b = TRUE;'
Parse error: parse error in Command line code on line 1

Of course, you can't ASSIGN $a to the constant FALSE. The same style applied above would've caused a a similar error in the C linux kernel code:

if ((options == (__WCLONE|__WALL)) && (0 = current-&gt;uid ))

Obviously, "0" is a constant value--you cannot assign a value to it. The missing "=" would've popped up right away.

Cool. Seems a little awkward at first, but in practice, it make sense.

HTH.

mail() replacement -- a better hack

This morning, I read Davey's post about how to compile PHP in a way that allows you ro specify your own mail() function. This is kind of a cool hack, but I've been using a different approach for a while, now, that allows much better control. Read on if you're interested.

Davey's hack, if you didn't read his post, yet, centers around defining your OWN mail function, after you have instructed PHP not to build the default one.

My hack doesn't require editing of the PHP source, or even a recompile. It doesn't require an auto-prepend, either, but it does require a small change to php.ini.

So, where's the magic? It lies in the sendmail_path directive.

When it comes to mail() (as well as many other things), PHP prefers to delegate the heavy lifting to another piece of software: sendmail (or a sendmail compatible command-line mail transport agent). By default, PHP will call your sendmail binary, and pass it the entire message, after composing it from the headers and body supplied by the developer.

One of the side-benefits to this system is the ability to override PHP's default, and seamlessly hook in your own sendmailesque binary or script.

Here's an example from one of my development environments:

sendmail_path=/usr/local/bin/logmail
sean@sarcosm:~$ cat /usr/local/bin/logmail
cat >> /tmp/logmail.log

This little bit of config & code is extremely useful in a non-production environment. How many of us have accidentally sent emails to actual customers from the development server? This little bit of trickery avoids this, and instead of sending the email (as PHP normally would), mail is instead logged to the /tmp/logmail.log file. Disaster avoided.

But, that file gets pretty big over time... it becomes unmanageable very quickly. So, in a different environment, I have an alternative:

sendmail_path=/usr/local/bin/trapmail
sean@sarcosm:~$ cat /usr/local/bin/trapmail
formail -R cc X-original-cc \
  -R to X-original-to \
  -R bcc X-original-bcc \
  -f -A"To: devteam@example.com" \
| /usr/sbin/sendmail -t -i

And what does this do? It traps all mail that would normall go OUT (say, to a customer), and instead, delivers it to devteam@example.com (with the original fields renamed for debugging purposes).

So, how does all of this solve Davey's problem?

This is something I whipped up after work, today, so it's pretty new code that likely has a few bugs lurking in it, but it's a good start:sendmail_path=/usr/local/bin/mail_proxy.php

<?php

//---CONFIG
$config = array(
  'host' => 'localhost',
  'port' => 25,
  'auth' => FALSE,
);
$logDir      = '/www/logs/mail';
$logFile     = 'mail_proxy.log';
$failPrefix  = 'fail_';
$EOL         = "\n"; // change to \r\n if you send broken mail
$defaultFrom = '"example.net Webserver" <www@example.net>';
//---END CONFIG

if (!$log = fopen("{$logDir}/{$logFile}", 'a')) {
  die("ERROR: cannot open log file!\n");
}

require('Mail.php'); // PEAR::Mail
if (PEAR::isError($Mailer = Mail::factory('SMTP', $config))) {
  fwrite($log, ts() . "Failed to create PEAR::Mail object\n");
  fclose($log);
  die();
}

// get headers/body
$stdin = fopen('php://stdin', 'r');
$in = '';
while (!feof($stdin)) {
  $in .= fread($stdin, 1024); // read 1kB at a time
}

list ($headers, $body) = explode("$EOL$EOL", $in, 2);

$recipients = array();
$headers = explode($EOL, $headers);
$mailHdrs = array();
$lastHdr = false;
$recipFields = array('to','cc','bcc');
foreach ($headers AS $h) {
  if (!preg_match('/^[a-z]/i', $h)) {
    if ($lastHdr) {
      $lastHdr .= "\n$h";
    }
    // skip this line, doesn't start with a letter
    continue;
  }
  list($field, $val) = explode(': ', $h, 2);
  if (isset($mailHdrs[$field])) {
    $mailHdrs[$field] = (array) $mailHdrs[$field];
    $mailHdrs[$field][] = $val;
  } else {
    $mailHdrs[$field] = $val;
  }
  if (in_array(strtolower($field), $recipFields)) {
    if (preg_match_all('/[^ ;,]+@[^ ;,]+/', $val, $m)) {
      $recipients = array_merge($recipients, $m[0]);;
    }
  }
}
if (!isset($mailHdrs['From'])) {
  $mailHdrs['From'] = $defaultFrom;
}

$recipients = array_unique($recipients); // remove dupes

// send
if (PEAR::isError($send = $Mailer->send($recipients, $mailHdrs, $body))) {
  $fn = uniqid($failPrefix);
  file_put_contents("{$logDir}/{$fn}", $in);
  fwrite($log, ts() ."Error sending mail: $fn (". $send->getMessage() .")\n");
  $ret = 1; // fail
} else {
  fwrite($log, ts() ."Mail sent ". count($recipients) ." recipients.\n");
  $ret = 0; // success
}
fclose($log);
return $ret;

//////////////////////////////

function ts()
{
  return '['. date('y.m.d H:i:s') .'] ';
}

?>

Voila. SMTP mail from a unix box that may or may not have a MTA (like sendmail) installed.

Don't forget to change the CONFIG block.

XSS Woes

A predominant PHP developer (whose name I didn't get permission to drop, so I won't, but many of you know who I mean) has been doing a bunch of research related to Cross Site Scripting (XSS), lately. It's really opened opened my eyes to how much I take user input for granted.

Don't get me wrong. I write by the "never trust users" mantra. The issue, in this case, is something abusable that completely slipped under my radar.

Most developers worth their paycheque, I'm sure, know the common rules of "never trust the user", such as "escape all user-supplied data on output," "always validate user input," and "don't rely on something not in your control to do so (ie. Javascript cannot be trusted)." "Don't output unescaped input" goes without saying, in most cases. Only a fool would "echo $_GET['param'];" (and we're all foolish sometimes, aren't we?).

The problem that was demonstrated to me exploited something I considered to be safe. The filename portion of request URI. Now I know just how wrong I was.

Consider this: you build a simple script; let's call it simple.php but that doesn't really matter. simple.php looks something like this:

<html>
 <body>
  <?php
  if (isset($_REQUEST['submitted']) && $_REQUEST['submitted'] == '1') {
    echo "Form submitted!";
  }
  ?>
  <form action="<?php echo $_SERVER['PHP_SELF']; ?>">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

Alright. Let's put this script at: http://example.com/tests/simple.php. On a properly-configured web server, you would expect the script to always render to this, on request:

<html>
 <body>
  <form action="/tests/simple.php">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

Right? No.

What I forgot about, as I suspect some of you have, too (or maybe I'm the only loser who didn't think of this (-; ), is that $_SERVER['PHP_SELF'] can be manipulated by the user.

How's that? If I put a script at /simple/test.php, $_SERVER['PHP_SELF'] should always be "/simple/test.php", right?

Wrong, again.

See, there's a feature of Apache (I think it's Apache, anyway) that you may have used for things like short URLs, or to optimize your query-string-heavy website to make it search-engine friendly. $_SERVER['PATH_INFO']-based URLs.

Quickly, this is when scripts are able to receive data in the GET string, but before the question mark that separates the file name from the parameters. In a URL like http://www.example.com/download.php/path/to/file, download.php would be

executed, and /path/to/file would (usually, depending on config) be available to the script via $_SERVER['PATH_INFO'].

The quirk is that $_SERVER['PHP_SELF'] contains this extra data, opening up the door to potential attack. Even something as simple the code above is vulnerable to such exploits.

Let's look at our simple.php script, again, but requested in a slightly different manner: http://example.com/tests/simple.php/extra_data_here

It would still "work"--the output, in this case, would be:

<html>
 <body>
  <form action="/tests/simple.php/extra_data_here">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

I hope that the problem is now obvious. Consider: http://example.com/tests/simple.php/%22%3E%3Cscript%3Ealert('xss')%3C/script%3E%3Cfoo

The output suddenly becomes very alarming:

<html>
 <body>
  <form action="/tests/simple.php/"><script>alert('xss')</script><foo">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

If you ignore the obviously-incorrect <foo"> tag, you'll see what's happening. The would-be attacker has successfully exploited a critical (if you consider XSS critical) flaw in your logic, and, by getting a user to click the link (even through a redirect script), he has executed the Javascript of his choice on your user's client (obviously, this requires the user to have Javascript enabled). My alert() example is non-malicious, but it's trivial to write similarly-invoked Javascript that changes the action of a form, or usurps cookies (and submits them in a hidden iframe, or through an image tag's URL, to a server that records this personal data).

The solution should also be obvious. Convert the user-supplied data to entities. The code becomes:

<html>
 <body>
  <?php
  if (isset($_REQUEST['submitted']) && $_REQUEST['submitted'] == '1') {
    echo "Form submitted!";
  }
  ?>
  <form action="<?php echo htmlentities($_SERVER['PHP_SELF']); ?>">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

And an attack, as above, would be rendered:

<html>
 <body>
  <form action="/tests/simple.php/&amp;quot;&amp;gt;&amp;lt;script&amp;gt;alert('xss')&amp;lt;/script&amp;gt;&amp;lt;foo">
   <input type="hidden" name="submitted" value="1" />
   <input type="submit" value="Submit!" />
  </form>
 </body>
</html>

This still violates the assumption that the script name and path are the only data in $_SERVER['PHP_SELF'], but the payload has been neutralized.

Needless to say, I felt silly for not thinking of such a simple exploit, earlier. As the aforementioned PHP developer said, at the time (to paraphrase): if guys who consider themselves experts in PHP development don't notice these things, there's little hope for the unwashed masses who have just written their first 'echo "hello world!\n";'. He's working on a generic user-input filtering mechanism that can be applied globally to all user input. Hopefully we'll see it in PECL, soon. Don't forget about the other data in $_SERVER, either..

... ...

Upon experimenting with this exploit on my own server (and watching the raw data in my _SUPERGLOBALS, conveniently, via phpinfo()), I noticed something very interesting that reminded me that even though trusting this data was a stupid mistake on my part, I'm not the only one to do so. A fun (and by fun, I mean nauseating) little game to play: create a file called "info.php" (or whatever name you like). In it, place only "<php phpinfo(); ?>". Now request it like this: http://your-server/path/to/info.php/%22%3E%3Cimg%20src=http://www.perl.com/images/75-logo.jpg%3E%3Cblah

Nice huh? A little less nauseating: it's fixed in CVS.

Fun with the tokenizer...

I was reminded, this past week, of how cool the tokenizer is.

One of the guys who works in the same office as I do had what seemed to be a simple problem: he had a php file that contained ~50 functions, and wanted to summarize the API without parsing through the file, manually, and cutting out the function declarations.

We introduced him to in-line phpdoc blocks (he works (as a Jr.-level PHP developer) in the same office, but for a different company, so he doesn't have to follow our coding standards, but I digress..), but the 50-function library in question didn't have docblocks.

Sure, he could (and did) pull up a list function NAMES with get_defined_functions (I assume by using array_diff against a before-and-after capture), but this didn't give him the argument names, or even the number of arguments for a given function, so I broke out some old tokenizer code I'd written.

In case you aren't familiar with the tokenizer, the PHP manual defines it as:

“[an interface to let you write] your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.”

The extension (which has been part of the PHP core distribution since 4.3.0) consists only of two functions: token_get_all and token_name, and a boatload of constants.

Enough babble, though, let's get to the meat. I pulled out this code I'd written for PEARClops (on EFNet #PEAR) that parses PHP source files and figures out what classes, functions/methods and associated parameters are included.

<?php

function get_protos($in)
{
  if (is_file(realpath($in)))
  {
    $in = file_get_contents($in);
  }
  $tokens = token_get_all($in);
  $funcs = array();
  $currClass = '';
  $classDepth = 0;

  for ($i=0; $i<count($tokens); $i++)
  {
    if (is_array($tokens[$i]) && $tokens[$i][0] == T_CLASS)
    {
      ++$i; // whitespace;
      $currClass = $tokens[++$i][1];
      while ($tokens[++$i] != '{') {}
      ++$i;
      $classDepth = 1;
      continue;
    }
    elseif (is_array($tokens[$i]) && $tokens[$i][0] == T_FUNCTION)
    {
      $nextByRef = FALSE;
      $thisFunc = array();
      
      while ($tokens[++$i] != ')')
      {
        if (is_array($tokens[$i]) && $tokens[$i][0] != T_WHITESPACE)
        {
          if (!$thisFunc)
          {
            $thisFunc = array(
              'name'  => $tokens[$i][1],
              'class' => $currClass,
            );
          }
          else
          {
            $thisFunc['params'][] = array(
              'byRef'   => $nextByRef,
              'name'    => $tokens[$i][1],
            );
            $nextByRef = FALSE;
          }
        }
        elseif ($tokens[$i] == '&')
        {
          $nextByRef = TRUE;
        }
        elseif ($tokens[$i] == '=')
        {
          while (!in_array($tokens[++$i], array(')',',')))
          {
            if ($tokens[$i][0] != T_WHITESPACE)
            {
              break;
            }
          }
          $thisFunc['params'][count($thisFunc['params']) - 1]['default'] = $tokens[$i][1];
        }
      }
      $funcs[] = $thisFunc;
    }
    elseif ($tokens[$i] == '{')
    {
      ++$classDepth;
    }
    elseif ($tokens[$i] == '}')
    {
      --$classDepth;
    }

    if ($classDepth == 0)
    {
      $currClass = '';
    }
  }

  return $funcs;
}

function parse_protos($funcs)
{  
  $protos = array();
  foreach ($funcs AS $funcData)
  {
    $proto = '';
    if ($funcData['class'])
    {
      $proto .= $funcData['class'];
      $proto .= '::';
    }
    $proto .= $funcData['name'];
    $proto .= '(';
    if ($funcData['params'])
    {
      $isFirst = TRUE;
      foreach ($funcData['params'] AS $param)
      {
        if ($isFirst)
        {
          $isFirst = FALSE;
        }
        else
        {
          $proto .= ', ';
        }

        if ($param['byRef'])
        {
          $proto .= '&';
        }
        $proto .= $param['name'];
      }
    }
    $proto .= ")";
    $protos[] = $proto;
  }
  return $protos;
}

echo "Functions in {$_SERVER['argv'][1]}:\n";
foreach (parse_protos(get_protos($_SERVER['argv'][1])) AS $proto)
{
  echo "  $proto\n";
}
?>

Save it as "parse_funcs.php" (or whatever you like) and call it like so: php parse_funcs.php /path/to/php_file

For instance:

sean@iconoclast:~/php/scripts$ php token_funcs_cli.php ~/php/cvs/Mail_Mime/mime.php
Functions in /home/sean/php/cvs/Mail_Mime/mime.php:
  Mail_mime::Mail_mime($crlf)
  Mail_mime::__wakeup()
  Mail_mime::setTXTBody($data, $isfile, $append)
  Mail_mime::setHTMLBody($data, $isfile)
  Mail_mime::addHTMLImage($file, $c_type, $name, $isfilename)
  Mail_mime::addAttachment($file, $c_type, $name, $isfilename, $encoding)
  Mail_mime::_file2str(&$file_name)
  Mail_mime::_addTextPart(&$obj, $text)
  Mail_mime::_addHtmlPart(&$obj)
  Mail_mime::_addMixedPart()
  Mail_mime::_addAlternativePart(&$obj)
  Mail_mime::_addRelatedPart(&$obj)
  Mail_mime::_addHtmlImagePart(&$obj, $value)
  Mail_mime::_addAttachmentPart(&$obj, $value)
  Mail_mime::get(&$build_params)
  Mail_mime::headers(&$xtra_headers)
  Mail_mime::txtHeaders($xtra_headers)
  Mail_mime::setSubject($subject)
  Mail_mime::setFrom($email)
  Mail_mime::addCc($email)
  Mail_mime::addBcc($email)
  Mail_mime::_encodeHeaders($input)
  Mail_mime::_setEOL($eol)

Not bad, huh?

There are some not-so-obvious bugs (inheritance, mostly), but for a relatively short script, it does a pretty good job.