Hackproofing Your Web Applications

Ultimately, no technology can protect your Web site or your applications. Only YOU can prevent hacking. Here's how.


“SSL makes your application secure.” That has to be one of most popular statements about security today — and also one of the most misleading. A more accurate thing to say is: “SSL helps prevent eavesdropping, but does little to protect either party from each other.” SSL doesn’t prevent valid credit card information from being sent to a malicious Web site, nor does it prevent fraudulent credit card data from being sent to an honest website. SSL does nothing to prevent vulnerabilities in a Web application, and flaws are easily found on SSL enabled servers.

For example, consider a Very Naive Savings and Loan that places the name of its customer in a URL, such as https://www.naivebank.com/display_balance?user=bob. It only takes a couple of seconds for the observant surfer to wonder: “What if I entered Alice’s name instead?” URL tampering is easy to try and is frequently rewarding.

Web applications are at risk whenever there’s a gap between what the user should do and what the user can do. Closing the gap — for example, by ensuring that private URLs cannot be guessed or usurped — is the key to building secure applications. Follow along and you’ll learn how to start worrying and love security.

Mind The Gap

Ultimately, no technology can protect your server. Your scripts, servlets, CGI, HTML — anything that’s reachable from the browser — is public and open to any possible use. You cannot control what is submitted to your Web server, or who is submitting it. The other end of the wire could be your best customer, or it could be a script trying to crack passwords. On a good day, with a nice user, the browser will submit what you expect. But don’t count on it.

While there are technologies to protect against specific kinds of attacks (like SSL prevents eavesdropping), the only way to make applications secure is to prevent abuse. To do that you have to make sure that your applications are used only as you intended: that values that should not be entered cannot be entered, that data that should not be tampered with cannot be tampered with, and that data that should only be read by specific people can only be read by those people.

Preventing abuse requires diligence. Making an application secure is not something to be done just at the beginning or end of a project, nor is it something that someone else can do for you. To be effective, security needs to be an integral and pervasive part of the design, development, and maintenance process.

As we’ll see, security is not difficult or expensive to implement. All that’s required is some incremental effort and a healthy dose of paranoia.

Keep this in mind: never trust the user. Too radical a statement? Consider some of the following exploits and you may think otherwise.


Let’s say you run a community Web site and have deployed an application to create user profiles. The application prompts for name, marital status, ethnicity, and gender. Gender choices are limited to Male, Female, and Prefer Not To Say, and are conveniently presented in SELECT radio buttons. One day, while browsing the server logs, you find something like this:

Gender=<script>alert(I am now stealing
your cookies for fun and
profit, yippee!.);</script>

What’s a script doing in the gender field? It was certainly not expected, yet all EvilBastard did to plant his trojan horse was save and edit your HTML page. Because the profile editor did not validate input, it simply saved the malicious JavaScript as EvilBastard‘s gender information. With the trojan horse in place, every unsuspecting user that visits the profile page you serve for EvilBastard will lose his or her cookies to a scheming hacker. Even worse, you abetted (albeit unknowingly) the crime.

Yes, the user should submit one of the three pre-set choices, but in this case the user can submit and save any data he wants. Even if you strip or disallow the <script> tag (a common strategy to prevent malicious scripting attacks), there are hundreds of other ways to insert JavaScript into HTML.


“But,” you might say, “the values were hard coded in the HTML!” Yeah, right. Nothing in HTML is “hard,” and nothing in HTML is hard to edit. The previous exploit is an excellent example of just how vulnerable your server can be (even if you’ve taken precautions). Here’s another.

Imagine that you manage a Web site that offers a bounty for user referrals. For example, Mora can earn ten dollars for referring her friend Morris to your site. To record the bounty, the referral form has a tag like this:

<INPUT TYPE=HIDDEN NAME=”referralbonus”

Even a novice HTML writer will guess that when the form is submitted, the server records the value of referralbonus in some database. A malicious user can easily save the HTML, change VALUE to, say 500, and submit the form, earning a $500 bonus instead of a $10 one.

HIDDEN variables are everything but. In fact, the HTML keyword HIDDEN should be probably be renamed CONVENIENTLY SET ASIDE BUT EASY TO SEE AND MODIFY (but that’s probably too long to be a descriptive keyword). HIDDEN variables are useful to preserve state between requests, but they are public, as is any data submitted by the client, including cookies, arguments in URLs, agent versions, and referrer information. Anything the client sends you is easy to see, and easy to modify.


Secrets on a Web site rarely stay secrets for long. “Back doors,” such as developer or debug tools posted on your site, are especially dangerous.

For example, let’s say a developer at Very Naive Savings and Loan accesses debug information via a special user name called admin, as in https://www.naivebank.com/display_balance? user=admin. All it takes is a leaked email (accidental or otherwise) or a smart guess, and the system is compromised.

“I know,” you suggest, “I’ll obfuscate the URL. I have a great idea for my own encryption scheme that no one will ever figure out.” Unfortunately, someone will, and probably in less time than it’ll take you to develop and deploy the new code.

This example may be contrived, but not much. Special path names or variables are often trivial to guess. Home brewed encryption libraries fall over to the simplest analysis. IP access restrictions are exposed by open proxy servers. Sometimes, hackers don’t even have to guess. If you forget to remove all of your debugging code in the rush to deploy your application, you could end up broadcasting your own valuable secrets. (Administration tools need to be approached very carefully and require access controls that are outside of the scope of this article.)

At the end of the day, Web applications are open and public interfaces, and there is little or no accountability for people that abuse them. You can’t tell the difference between an honest user and an attacker, so building something that just works is not enough. After all, malicious hackers don’t possess special knowledge or ability — they simply understand your application better than you do. They find the gaps — programming errors and lapses that permit abuse — no matter how small, and exploit them.

To foil hackers and protect youself and your users, you have to build an application that can withstand all possible forms of abuse. While that might sound difficult, it’s not.

For example, all of the exploits described above are handily and easily thwarted if a few simple defenses are implemented in the normal course of development.

Validate, Validate, Validate

Error messages are generally useful. Error messages that echo what the user entered are good for usability. However, echoing user input untouched or unfiltered is an enormous security risk. Why? Because the user can insert their own (potentially malicious) HTML or JavaScript on your Web page.

This exploit (a variation of the Web form exploit we saw earlier) is known as cross-site scripting. A malicious hacker simply inserts his own JavaScript or HTML into an unsuspecting Web application and then sits back and waits until the site serves it to an unsuspecting user. Cross-site scripting is dangerous because JavaScript served from a given domain can steal that domain’s cookies. Even more insidious, a simple HTML LAYER tag can overlay a Web page with a trojan horse login page and capture user names and passwords. Clever? Sure. Destructive? Absolutely.

The only way to defeat this kind of exploit is to validate all input. Validating input is the single most important thing you can do to protect your applications. For example:

  • If the user is not supposed to enter HTML into a field, strip everything a browser might interpret as HTML.

  • If the user has to enter HTML — for instance a FONT tag — parse the entire tag and only allow the attributes and values that you choose to support (again, there are many ways to sneak JavaScript into a tag).

  • If you expect a number within a certain range, validate that the input is a number and is in that range.

  • If a text field is 12 characters long, don’t accept a 4K string. Buffer overflows are just another example of input not being validated.

In addition to validating form fields, you should also validate variables passed in with URLs. For example, if you have a CGI program that uses a variable to pick a file to display, as in:


it can easily be edited and changed to:


If your program fetches any file, there is no setting on Apache that will protect you. You need to ensure that your application only displays files that you want to display. Moreover, you should pre-fill the path, and reject path characters.

Another easy “input” attack is playing with sequential or easy to guess numbers. For example, the value 10 in show_ balance?customerNumber=10 can easily be edited into 42, 617, and so on. If you need to index data by a number or string, then keep it private by making it hard to guess. Use 64-bit or longer random numbers. That way an attacker would have to guess an awful lot of URLs to see someone else’s information. Something that’s so expensive to guess that it is not breakable with current technology is known as computationally secure, or in layman’s terms, just plain hard.

In general, restrict your input as much as possible. Validating input is the simplest and most effective defense against abuse. Validating input not only makes your program more secure, it also makes it more reliable.

Sealed For Your Protection

Sometimes error messages are passed in URLs. For example:

Your average bored Web surfer (or opportunistic hacker) can turn this into:

and mail it out to see who falls for it. Unfortunately, some people will.

Changing the arguments to a URL is a form of tampering, an exploit where application data is captured, modified, and re-used in a furtive attempt to subvert (read: abuse) some site policy or rule. Editing hidden variables (like we saw earlier), changing the values of cookies, and reverse-engineering homebrew encryption schemes are three other kinds of tampering.

Because Web applications are data rich and inherently public, Web applications are highly susceptible to tampering. Whenever data is either public (an HTML page, a URL) or very easy to access (cookies), the only way to protect it is to sign it.


A digital signature is the equivalent of a tamper-proof seal on your data: if someone tampers with your data (just like someone tampering with a bottle of aspirin), the digital signature will show “physical” evidence of malice or mischief. If the seal has been tampered with, the contents cannot be trusted.

Digital signatures are used widely nowadays as guarantors of data origin and integrity. Many kinds of algorithms generate digital signatures, but the concepts behind them are the same:

Figure One: Creating and verifying signatures

  • Whenever you transmit data (a string, a cookie, a serial number, a dollar amount, and so on), compute and transmit its signature, too. To be more specific, transmit the message m, and a fixed-length signature s, derived by applying signature function S() to m. (In other words, transmit m and s = S(m)).

  • Whenever you receive data, capture its original signature and compute a new signature. In other words, record s’ and the message m’ (which may or may not be the same as s and m) and compute a new signature, t using the same signature function S(), or t = S(m’).

  • If the data’s original signature and new signature mismatch, that is s’!= t, tampering has occurred.

Figure One illustrates the process of creating and verifying signatures.

Obviously, the signature function S() on the transmitting and receiving machines must be compatible, and S() should be non-trivial. While several kinds of functions, like checksums, generate fixed-length signatures, strong one-way hash functions are preferred because they have two essential properties: it’s hard to figure out the original message from the hash value (hence the name “one-way”), and it’s hard to find two messages that yield the same signature. (If you want to know just how many permutations are possible, see the sidebar “How Hard is Hard in Cryptography?”.)

How Hard is Hard in Cryptography?

Modern cryptosystems (encryption and one-way hashes) are based on four components: an algorithm, all the possible plaintexts, ciphertexts, and keys or secrets. Encryption is the process of using an algorithm combined with a secret to encode plaintext into ciphertext. Decryption is the reversal of the encryption process (except strong one-way hashes have no reversal function).

Developing your own encryption algorithm does not work. If someone leaves your development team, or if a machine or code repository is compromised, you have to change the algorithm, test it, and redeploy it. And unless you have a great cryptologist on staff, chances are your algorithm will be weak. Some people believe creative use of XOR makes for good encryption, but it’s obvious to well-trained eyes.

Instead, it’s best to rely on published algorithms that have been analyzed by the world’s cryptologists, where security comes from keeping the encryption secret (sometimes called a key) truly secret. As long as your secret remains secret, the only way for someone to read your messages is to guess which secret you’ve used.

How hard is it for someone to guess your secret? An attacker needs to choose the correct secret out of all the possible secrets of that size. The size or number of bits needed for a strong secret depends on the algorithm. Ultimately, if finding the correct secret is very expensive, than the data that the secret protects should be safe. Here are some examples of expense for symmetric algorithms:




56 bits

16k years

$58,000 USD

128 bits

7.9*1025 years


160 bits

3.4*1035 years


256 bits

2.7*1066 years


Guessing a strong secret is pretty darn hard.


The two most commonly used strong one-way hash functions are MD5 and the stronger SHA-1. Use either of them properly and the integrity of your data is assured.

To protect a URL variable mesg from tampering you can generate an MD5 or SHA-1 hash of it (H()) and include it in your URL:


However, MD5 and SHA-1 are so well known it would not take long for someone to figure out how you’re computing mesgHash and insert their own evilmesg with a matching H(evilmesg). To prevent this sort of tampering, you need to add a secret to the mix. Here is HMAC, a good function to generate signatures using an embedded secret (this class of function is known as a MAC, or Message Authentication Code):

s = S(m)= H(secret, H(m, secret))

(See the sidebar “Generating and Validating Signatures” for pseudo-code on generating and validating signatures). So, the URL ends up looking like this:

Generating and Validating Signatures


string mesg = “Please try again”;
secret sigSecret = s // 128 bits of properly random data
string serverSig;

// HMAC implementation using SHA-1
serverSig = SHA1(sigSecret, SHA1(sigSecret,mesg));

string baseURL = “https://naivebank.com/error?”

baseURL + = “mesg =” + mesg + “&sig =” + serverSig


string clientMesg = request_record.get(“mesg”);
string clientSig = request_record.get(“mesgSig”);

// Now compute a signature and see if it matches
// the submission
serverSig=SHA1(sigSecret, SHA1(sigSecret, clientMesg));

if (serverSig == clientSig) {
// Valid signature, move along
} else {
// Invalid signature, bogus request


mesgSig is our signature. Due to the properties of strong one-way hash functions, it is hard for someone to generate a matching signature if they do not know the secret. Again, to determine if data was corrupted, you must verify the signature when you receive the data. Compute a new signature from mesg, and compare it to mesgSig to determine its integrity.

Always generate a signature that covers all of the data you want to protect. A common mistake is to separate the data into two disjoint sets. For example:


In this case, because two signatures, userSig and accessSig have been created, Mallory could take the access and accessSIG pair for distribution to friends and family.

If you want to make sure that a piece of data is only useful to a specific user, you must include the user information in the signature:


Signatures are one of the most useful cryptographic tools. The use of a secret ensures that only the parties that share the secret can verify the signature. This is a cornerstone of secure authentication.

Tales From The Crypt-o

Encryption and decryption utilities are fairly common on Linux systems, and you’ve probably used something like des or gpg to protect the contents of personal or sensitive data files. There are many encryption schemes, but all follow the same model.

Given a (plain text) message, m, an encryption secret, s, and an encryption function, E(), we can yield ciphertext, C = E(m, s). Encryption functions always have a matching decryption function D(), such that D(C, t) = m. The security of algorithms E() and D() depends on keeping the key secret. An encryption algorithm is symmetric if the secret used to encrypt a message is the same secret used to decrypt it, in other words s = t or D(E(m, s), s) = m.

Whenever someone asks, “What is the correct encryption function to use?”, the best response is “What for?” Encryption is often thrown into an application to make it secure, but in most cases does little or nothing to improve the situation.

To use encryption effectively, you must understand what you’re protecting against. If you are concerned about eavesdropping between the client and the server, then use SSL. If you are concerned about tampering on the client, then use signatures. If you want to communicate data between two servers through a client, and you do not want the client to read it, then encrypt it. If you want to protect a database from being stolen, then encrypt it, but keep the secret in a separate machine from the data. (If you store the key and the data in the same machine, someone downloading the database has also downloaded the decrypting binary.)

And while encryption provides secrecy, it does not necessarily provide integrity. Most people don’t know that decryption functions always work: they take any input and always produce output. Tampered input simply decrypts to garbage. To ensure that the decrypted message matches what you encrypted, you also need to use a one-way hash.


For example, let’s say you need to pass around a remaining credit balance in a URL. If you encrypted the data you might try this:

?CreditBalance=E(user_credit, s)&creditSig=

Unless a client knows the secret, a bogus CreditBalance variable will decrypt to garbage, but creditSig is still easy to guess as it is a simple hash. A malicious user can generate a garbage CreditBalance and a matching creditSig, and if your application blindly turns the decrypted data into a number you could get in trouble.

A safer approach is:

?CreditBalance=E(user_credit, s)&creditSig=
H(CreditBalance, s)

Or even better:

user_credit, s)

This last example works well because any tampering of the ciphertext will make the verification of the hash fail after decryption. You can take this pattern and implement a tamper-proof encrypted token library.


Even if the credit balance example from the previous section uses encryption and signatures correctly, it can still be part of a fundamentally insecure program. Replay, where the user revisits the same URL again and again (for example, a bookmark), can expose weaknesses in your application.

For example, many sites limit the number of failed logins to prevent an attacker from obtaining a password by brute force guessing. If the user does not provide the correct login credentials within, say, three tries, the user’s login is disabled. The most naive implementation of this keeps a count in a HIDDEN variable, but that would be easy to see and tamper with. The next stepwise refinement would be to encrypt and sign it. However, all an attacker needs to do is save the value from the first try and just keep replaying it forever. Just because you can’t decrypt the data doesn’t mean you can’t use it.

Again, you need to understand what you are protecting against. If you need to prevent a brute force attack, data validation, tamper proofing, and secrecy do not solve the problem. In this case, good old fashioned server side state (in the form of a server side number-of-tries counter) fixes the problem.

The only thing a client is good for here is to cache and display data to the user, but the ultimate source of truth lies on the server. If your application requires changing persistent state information, you need to keep it on your Web server or in a database.


In designing a web application there are some questions you always need to ask:

  • What happens if a user saves and then replays a URL or variable? If you just use client side variables to carry the user’s credit balance, then a bookmark can be used to reset the balance. In this case, relying on server side state makes things easier. (In general, implementing a client side solution to maintain persistent state is difficult. You have to cover all usage scenarios, and be able to deal with the client losing the information due to a crash or malice.)
  • What happens if a user posts your URLs or cookies to a message board? For example, if an account balance is not bound to a specific user, then the balance token can be copied and used multiple times. You can protect against this by grouping user identity and other relevant information into a complete set of data that is signed as a single datum.
  • How long are the URLs, cookies, or variables good for? If your user walks away from the computer and you do not include expiration information in the signatures, then the back button of the browser will keep the session alive. This can be addressed by including a timestamp in the data that you sign. By verifying the signature you know that the timestamp has not been tampered with, allowing you to safely implement a timeout strategy for your application.

The solutions to these problems lay in the application’s design and the details of the implementation. These are not problems that can be easily fixed at the end of a project, or that someone else other than the developer can fix.

To build a secure web application you need to be asking these kinds of questions at every stage of the development process. Sometimes the answers will not be ideal, but you’ll understand your risk and know how your application will behave under duress.

Of Strip Searches and Metal Detectors

Firewalls, encryption, hardware tokens, biometrics, metal detectors, and strip searches are all tools designed to solve specific security problems. However, without some expertise and an overarching plan, all of these tools can ultimately be ineffective.

Similarly, if you want to build secure Web applications, you need to understand what problem you are trying to solve. There is nothing you can do to prevent people from trying to hack your application, but you have total control over how your application will respond. If you always validate input, sign what you need to tamper-proof, and encrypt what you need to hide, you can prevent almost all forms of abuse almost all of the time.

And remember, no matter what you do, never trust the user.

Ten Rules to Secure By

Preventing all forms of abuse might sound difficult, but it’s easier than you think. Just keep the right mindset (never trust the user), and remember the following guidelines.

  1. If you don’t need it, don’t build it. Any feature you build will be abused in every possible way. One of the best ways to prevent abuse is to simply limit your exposure. Take a minimalist approach to building your application. If you don’t need it, don’t build it.

  2. If you build it, someone will find it and figure it out. Everything you deploy is part of a public interface. Someone will eventually find every file, CGI, and URL on your web server, then dissect it, edit it, break it, send it to their friends, and post it on message boards. Build your application so that the only way it can be used is the way it should be used.

  3. Always validate all input. Just because you expect a number under 10 in a form, it doesn’t prevent someone from submitting “War and Peace”. Validating all input protects your program from malicious or confused users, and makes it more reliable. Make sure that what a user can enter is what they should enter.

  4. If you need to tamper-proof data, then sign it and validate the signature. If you need the client to hold state on behalf of the server, there is no guarantee that what gets submitted back to the server is what your server generated in the first place. Since you have no control over what the client submits, all you can do is detect whether the data you sent was harmed. Sign the data you send and validate its signature when you get it back. Make sure that only data that your server can accept is the data it should accept.

  5. If you need to hide data, encrypt it. Sometimes people try to hide data by simply scrambling it. This doesn’t work and is trivial to break. If you want to hide data, you need to encrypt it using a proven strong encryption function, and treat the encryption key as a secret that no one should ever know. This way only the people or machines that should read the data can read the data.

  6. In building restrictions, always use allow instead of disallow. If you forget to put something in an allow list, then you restrict functionality. If you forget to put something in a disallow list, then you cause a problem. With a disallow list the attacker can abuse anything you forgot to disallow or anything that is made possible by a change in the browser. With an allow list, the only features that can be used are those that should be used.

  7. For persistent information server side state is sometimes best. Any information that you send to the client, such as number-of-tries counters or account balances, can be replayed by anyone at any point in the future even if it’s signed and encrypted. For state that you care about preserving accross requests or sessions sometimes keeping a server side table or database is the most effective way to go.

  8. Make users accountable by keeping activity logs Once you’ve secured your application, it’s essential to keep a good log of activities. A log helps you identify the who and the how of new types of abuse, and helps you iterate to better security. When you find your application is being abused, log forensincs are your key to the crime.

  9. Always understand what you’re protecting against. It is easy and enticing to throw security tools at a problem, but to use them effectively you need to understand what you are protecting against. Put yourself in the shoes of a malicious user that is trying to abuse your application, and apply one or more of the other eight rules above to protect against abuse. In other words, make sure that your application can only be used as you intended.

  10. Never trust the user. Or, to be more precise, be very explicit about what you trust the user with. (See 1, 2, 3, 4, 5, 6, 7, 8, and 9).

Arturo Bejar earns his living being paranoid and making other people paranoid. You can reach Arturo at hackproofing@ yahoo.com.

Comments are closed.