Regular Expression matching with newlines

I ran across a regular expression modifier today that I have not used before. When matching some text that spans multiple lines, you can use the ‘s’ modifier at the end of the regular expression to treat the string as a single line.

For example, I was trying to match some html that spanned multiple lines like this

<td class='something'>  This is the text I want to match
</td>

This expression didn’t match:

preg_match_all("#<td class='someting'>(.+?)</td>#", $source_string, $matches);

But simply adding the ‘s’ flag after the closing #, it worked as desired:

preg_match_all("#<td class='someting'>(.+?)</td>#s", $source_string, $matches);

PHP Performance – isset() versus empty() versus PHP Notices

I’m cleaning up a lot of PHP code and always program with PHP error_reporting set to E_ALL and display_errors turned on so that I make sure to catch any PHP messages that come up. Since starting on this site, I have fixed literally hundreds (maybe thousands) of PHP Notices about using uninitialized variables and non-existent array indexes.

I have been fixing problems like this where $somevar is sometimes undefined:

if ($somevar)

by changing it to:

if (isset($somevar) && $somevar)

This successfully gets rid of the NOTICEs, but adds some overhead because PHP has to perform two checks. After fixing a lot of this in this manner, I’ve noticed that the pages seem to be generated a little slower.

So, to provide some conclusive results to myself, I wrote up a quick benchmarking script – available at php_empty_benchmark.php. It goes through 1,000,000 tests using each of these methods:

  1. if ($a) – This generates a notice if $a is not set
  2. if (isset($a)) – A simple clean way to check if the variable is set (note that it is not equivalent to the one above)
  3. if (isset($a) && ($a) – The one that I have been using which is equivalent to if($a), but doesn’t generate a notice.
  4. if (!empty($a)) – This is functionally equivalent to if($a), but doesn’t generate a notice.

It measures the time to perform 1 million tests using a defined percentage of values that are set.  It then computes the difference as a percentage of the time taken for the original test (the one that generates the notices).   A ‘diff’ of 100 means that the execution time is the same, greater than 100 means that it is faster, and less than 100 means that it is slower. A typical test produced these results:

    With NOTICE: 0.19779300689697
    With isset:  0.19768500328064 / Diff: 100.05463419811
    With both:   0.21704912185669 / Diff: 91.128222590815
    with !empty: 0.19779801368713 / Diff: 99.997468735875

In summary, using the if (isset($a) && $a) syntax is about 8-10% slower than generating the PHP Notice. Using !empty() should be a drop-in replacement that doesn’t generate the notice and has virtually no performance impact. Using ifset() also has no performance impact, but is not exactly the same as ‘if($a)’ since isset() will return true if the variable is set to a false value. I included it here, because it often make the code a little more readable than the !empty($a) syntax. For example:

$myvalue = !empty($_REQUEST['myvalue']) ? $_REQUEST['myvalue'] : '';

Versus

$myvalue = isset($_REQUEST['myvalue']) ? $_REQUEST['myvalue'] : '';

KnitMeter.com Beta

My wife has gotten seriously into knitting in the past year and was recently wondering about how much she had knit in the past year. I was surprised that there doesn’t seem to be a website for tracking such information, so decided to make one for her (and for anybody else who might want it).

The concept is pretty simple – just put enter how much you knit each day and it will add it up for you and can summarize it by project. It generates a little widget that knitters can put on their blogs to compare with others.

The site still needs a little work here and there, but is pretty functional at this point. Users are free to sign up and try it out – all for free of course. I’m looking for user input to see what still needs some work.

Installing trac with webadmin on CentOS5

I’m not overly familiar with Python applications, so it takes a little while for me to figure it out each time. I need to document it somewhere so I don’t have to reinvent the wheel every time – might as well do it here so that others can find it.

Install the rpmforge repository

wget  http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.3.6-1.el5.rf.i386.rpm

rpm -i rpmforge-release-0.3.6-1.el5.rf.i386.rpm

Install trac from the rpmforge repo

apt-get install trac

Install ez_setup

wget http://peak.telecommunity.com/dist/ez_setup.py

python ez_setup.py

And install webadmin with easy_install

easy_install http://svn.edgewall.com/repos/trac/sandbox/webadmin/

Poor experience and uptime with rapidvps.com

I heard good things about RapidVPS from several member of my local LUG.  I’d also heard good things about slicehost, but they seem to be perpetually unavailable.  So when I was setting up a new development and testing server, I figured that I’d give rapidvps a try.  I kindof like seeing how different companies do things and they have a pretty decent package for $30/month.

It turns out that was a poor choice.   I was unimpressed from the first day.   My new RapidVPS server was a pretty vanilla install of CentOS5.  Not much had been customized for their environment.   The name servers in /etc/resolv.conf didn’t even work and there were a bunch of other little annoyances that just didn’t make sense.  I blew it off at the time since I was able to get them resolved pretty quickly.

Their support staff was fairly responsive, but tended to skirt the direct questions that I asked.  For example, I asked specifically why the name servers were incorrect on a fresh install, and they just replied that they were fixed now.

I primarily use this machine for PHP development and testing.  I spend 6-8 hours a day logged in via SSH editing files directly.  So I notice pretty quickly when things go wrong.   One or two times a week I noticed that the IO load gets really high and things take forever.  Doing a simple directory listing was taking over 30 seconds.  When I sent in a support request about that, their reply was something along the lines that most customers use them for running LAMP websites, and that they generally work fine for that purpose, and that the high IO wouldn’t be a problem.

On several other occasions, their network has just become incredibly slow.   Replies from support indicated that one of their customers was getting attacked.    Right now, my server appears to be completely down, and they just replied that the machine is ‘recovering/doing a raid rebuild’ and will be up shortly.

So, I’ve had this machine for almost two months and had all of these problems.    I’d like to just ditch them and sign up for another server at RimuHosting.   But I’ve spent quite a bit of time getting everything configured just right and don’t have time at the moment to move everything somewhere else.

I guess I’ll have to deal with it for another month or so, until development slows down a little bit.  Then I’ll have to spend a few days migrating everything to a new service.  In the mean time, I definitely won’t be recommending RapidVPS to anybody.

Google Maps knows where you are

Google Maps introduced a new feature recently that can determine your location when using their mobile version from a cell phone.   If your phone has GPS available it can use that to get a pretty precise location, otherwise it can somehow determine which cell tower you are on to get you an approximate location.   That is pretty powerful.

I wonder, though, how Google is able to determine which tower you are using.    That has some pretty big potential privacy issues if anybody you call or send a text message to, or any website you visit from your phone can somehow determine what tower you are on.

Mixed experiences with ScanAlert

I’ve been seeing those ‘Hacker Safe’ logos on sites for a while now.  As a consumer I’ve always figured that they are kindof a joke, and that sites that display them really aren’t any more secure than any other site.  I’ve recently had some experience with being able to log into a ScanAlert account and seeing what kind of things they actually do.

Overall, they do the basic kinds of things like telling you what ports are open – stuff that the system administrator should already know and would just take a minute with nmap to find out.  They also check the banners for each service to tell you what version you are running.  It produces warning if you are using software that is more than a couple months old. You can alert and give warnings to other people.

What I found the most useful though, was its attempts to look for SQL Injection and XSS vulnerabilities.   From their FAQ:

ScanAlert audits every publicly available part of the domains Web application. This includes all HTTP services, configuration files, and any scripts (CGI, PHP, etc.). ScanAlert submits all database query parameters for vulnerabilities such as SQL injections and cross-site scripting. Since attacks along these vectors vary, ScanAlert must test each query parameter multiple times.

The website that I was looking at had an XSS vulnerability, and it goes into detail about what request parameters were used and everything to create it.   That is pretty useful information to have so that you can look into those pages and get them fixed.  It would take a while for me to go through every page and verify that I’m properly sanitizing user input everywhere.

However, with the site that this was scanning, I’m almost positive that there are really more XSS vulnerabilities than ScanAlert alerted me to.  I basically provided a place to start looking, but is certainly not an exhaustive test.    Moreover, the XSS vulnerabilities were scored only as a ‘Medium Risk’ – A 2 on a scale of 1 to 5 (1 being information disclosure like a robots.txt file, 5 being something really bad like hosting a virus or something).

I’m not sure at what point ScanAlert decides that a vulnerability is bad enough to not display their ‘Hacker Safe’ logo on a site.   Evidently it is higher than a 2 though, because this site still qualified for one.   So, ScanAlert is useful to system administrators and programmers to help identify threats.  It’s only useful though if the website owner actually does something about them.

I had several ‘Medium Risk’ vulnerabilities due to running slightly outdated versions of Apache and PHP.   Both were compiled from source and newer than what was available in the distro’s repositories, so I got them recompiled with the latest stable versions.   I doubt that most sites bother to resolve these issues since it takes a significant amount of work and doesn’t affect the ability to have that all-important ‘Hacker Safe’ banner on your site.

I still have the same opinion of it as a consumer though – basically that the website doesn’t have some blatant vulnerability that is easily exploitable.   There are so many other ways that attackers can gain access to information though, that having a little logo on your site doesn’t instill any confidence in me.