How to clean data



Back in the day nucleocide.net was hacked. Why? I was putting user data directly into SQL commands without any checking of the data. This is bad. I'm not going to get into a big discussion about various injections, but I will mention the functions that I use to prevent them.

These functions use Regular Expression's to check the data. What is a RegEx? It is basically a pattern matching language. You can compare a string to a regex to see if it is valid or not, or you can strip invalid characters.

Here is my RegEx function to make sure a string only contains letters, numbers, and the underscore character:

function nukeAlphaNum($value) {
return ereg_replace("[^a-zA-Z0-9_]", "", $value);
}

Simple, huh? You could even put it all on one line if you wanted. The function takes one argument, a string, and returns another string that only contains lowercase letters (a-z) uppercase letters (A-Z) numbers (0-9) and underscore (_). ereg_replace takes three arguments, the RegEx, the character to replace it with (in this case nothingness) and the string that it is sifting through (the one we send to the function).

Here are some others that I use:

function nukeAlpha($value) {
return ereg_replace("[^a-zA-Z]", "", $value);
}

function nukeHex($value) {
return ereg_replace("[^0-9a-fA-F]", "", $value);
}

function nukeNum($value) {
return ereg_replace("[^0-9]", "", $value);
}

These are all pretty self explanatory. There is one drawback; RegEx isn't the most processor friendly function. On small sites like I run it is fine but on larger sites with a ton of users and a shared hosting plan, it might get out of hand.

There is one more type of RegEx function that I'll use. These use the eregi function. Here is an example of what I use to validate an email address:

function nukeValidEmail($value) {
if (eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+
(\.[a-z0-9-]+)*(\.[a-z]{2,3})$", $value))
return true;
else
return false;
}

Pretty, isn't it? (I broke the eregi line in half to fit in this layout). Instead of removing bad characters, this function returns a true if the email is valid and a false if it is not valid (duh). Different applications call for different measures. This RegEx is overly complex to explain here.

This last function is what I use to validate a website. I allow blank strings, http://, and a full website. I allow the first two options because not everyone that creates an account on my various sites has a website:

function nukeValidWebsite($value) {
if (eregi("^(http|ftp|https)://[-A-Za-z0-9._/]+", $value))
return true;
else if (empty($value) || $value == "http://")
return true;
else
return false;
}

Have fun and stay clean! If you would like more information on regular expressions check out Regular-Expressions.info.

If you would like more information about securing your site from injections, check out this article from Penguicon 2006 (I attended and the author Flavio daCosta hooked me up with this presentation).

Tutorial Added Jan 10, 2007 @ 12:24am

Login

Username:
Password:

Shoutbox

By !nucleo
Jun 12, 2008 7:56pm
About time you're back... Shoot me an email with your 88x31.
By ?Lee
Jun 12, 2008 3:27pm
vimixx.net has moved to lee-stewart.co.uk!!!
By ?Medvedko
May 4, 2008 7:04am
That's right. Sorry, I am not making much sense, am I? Basically, I created the DB myself instead of using your install script. All necessary tables are there, I added few more based on plugin POST options and added those to the update.php also. At the beginning I added mysql.ssi.php function to both update and winamp_playing scripts as I had some troubles before with include_once.
By !nucleo
May 2, 2008 9:56am
Now I'm really confused, are you modifying my shoutbox script?
By ?Medvedko
Apr 30, 2008 1:08am
another thing is that update script terminates with invalid password (I added the pass in the Additional Options in the plugin and set the pass in the update script.
By ?Medvedko
Apr 30, 2008 1:06am
my fault really. I modded the scripts and added more tables to the DB such as "TIMESTAMP". I think include_once doesn't fetch the mysql function. Or atleast this is what seems to me is the problem.
Name: Shout: This Number: [View All]

Partners

Rand Affiliates

Blue Downloads PixelXS Franscape Snoogins