LAN Based Web Caching
Moore's law states that the number of transistors on an integrated circuit and therefore performance will double every 18 months. However, it takes fewer than 18 months for the Internet to double in size. This means that the increased processing power of the latest networking hardware cannot keep up with the increased burden on Internet backbones, routers and servers brought about by the rapid growth of the Internet. Problems of congestion on the Internet will therefore not be solved by improved hardware technology and increased bandwidth alone. Higher bandwidth Internet connections such as cable modem or ADSL only improve delivery of content from the ISP to the end user. They will not solve problems caused by overburdened servers or congested Internet routers. LAN based web caching however, can help improve this situation both for the user of the caching server and for the Internet at large.
We are presenting this information in a Q&A (Questions and Answers) format that we hope will be useful.
We welcome feedback and comments from any readers on the content of this document or its usefulness.
We are providing the best information available to us as at date of writing and intend to update it at frequent intervals as things change and/or more information becomes available. However we intend this Q&A as a guide only and recommend that users obtain specific information to determine applicability to their specific requirements. (This is another way of saying that we can't be held liable or responsible for the content).
The full Q&A is divided into two parts. Part one is general in nature and less technical, Part Two deals with more technical matters.Download the LAN Based Web Caching PDF
This KnowledgeShare Document addresses the different types of LAN based web caching techniques in use today. Vicomsoft have gained significant experience in the area of web caching and would like to make this information available to those interested in this subject. For those who would like to study this subject in more detail useful links are listed at the end of this document.
Part One: Questions
- What is Caching?
- What are the main benefits and trade-offs of caching?
- What different types of caching exist?
- How does LAN based web caching work?
- How is the freshness of content controlled?
- What are the advantages of LAN based web caching?
- What is the difference between a proxy server and a caching server?
- What exactly does a web caching server store copies of?
- Does LAN based caching improve DNS lookup response times?
- Do I still need caching if I have a high bandwidth connection such as cable modem, ADSL or T1?
- Is web caching useful on a small LAN, or only on large corporate networks?
- Is there any real cost justification for a web caching server?
- Don't caching servers encourage copyright violation?
- Are there any limitations that I should be aware of?
- What is the bottom line? What does Vicomsoft recommend?
1. What is Caching?
Caching is a technique that is used to temporarily store a copy of information that has been requested by a piece of software or hardware. Cached information is generally stored closer to the requester than the permanent information is. It is also generally stored on a physical device capable more rapid delivery than the originating device. For example, computers have both disk and memory caches. Frequently used information can be retrieved much more rapidly from the memory cache than from the computer's main memory. Frequently accessed data can be retrieved from a disk cache much more quickly than if they were repeatedly retrieved from disk. This is because the disk cache is in RAM, which is much faster to access than a hard disk by several orders of magnitude. Web browsers also cache web pages on disk and in memory. Retrieving web pages from either the disk or memory cache is much faster than returning to the original site and retrieving them over the Internet.
2. What are the main benefits and trade-offs of caching?
Caching in all of its various forms offers faster access to cached content. Further, when cached resources are copied from remote locations, caching reduces the use of wide area bandwidth. There is a risk however that in some cases cached content may not be up to date. There are means of dealing with this situation that will be examined below.
3. What different types of caching exist?
The focus of this Q&A will be on LAN based web caching. There are, however, a number of different types of caching in common use today:
- Carrier-class caching servers on the Internet store temporary copies of highly popular Web content, for delivery to thousands of ISPs, and reducing congestion on the Internet backbone.
- Local Area Network (LAN) based caches are used in corporations, small business, schools and universities to improve network efficiency.
- Individual Web browsers use disk and memory caching to store local copies of web pages.
- Hard disk drivers cache frequently accessed information to improve access times.
- Computers have memory caching systems which are faster to access than regular RAM.
4. How does LAN based web caching work?
A software program known as a web caching server is located on the LAN. All requests for web content are directed at this server. When it receives a request, it first looks in its own disk cache to see if the content is present and has not expired. If so, it delivers the content to the requester. If not, it fetches the content from the source, stores a copy of it on the local disk and delivers it to the requester. Some high performance caching servers store and deliver the content concurrently.
5. How is the freshness of content controlled?
Web servers may deliver a caching directive with their web pages. The directive may instruct the web caching server not to cache the content. A frequently updated news page, or a page displaying stock quotes, for example, should not be cached at all. In this example the caching directive instructs the caching server whether or not to cache certain content, but not how long to keep it.
The caching directive may also contain additional information called "time-to-live". Some content can be cached indefinitely, while other content may only be valid for a month or an hour. A professional caching server will follow the originating server's caching directive when setting the time-to-live attribute of cached content. If the originating server instructs the caching server to store the content for a limited time, the caching server will stamp the content with an expiry date and time. Up to this date and time, the caching server will fulfil requests for the content from its own cache. After expiry, it will check to see if the cached content has been superseded before fulfilling the request.
The end result is that the user should always receive content identical to what would be delivered had the request been forwarded to the originating server.
6. What are the advantages of LAN based web caching?
The purpose of LAN based caching is to improve network efficiency by reducing the amount of traffic between the LAN and the Internet. The most obvious, and most frequently cited benefit is the shorter time required for the caching server to deliver cached content. Delivery time and therefore the end user experience are enhanced dramatically. For example, delivery of a 100KB web page from the originating server to the end user would take about 17 seconds over a 56Kbps modem, or 7.8 seconds over a dual channel Multilink PPP ISDN connection, assuming that there was no additional traffic congestion at the ISP or on the Internet backbone (this is not necessarily the case). The same page would take one second to deliver over a T1 connection, again assuming perfect Internet traffic conditions. However, this same page would be delivered from the caching server to the end user in about one tenth of a second regardless of Internet traffic conditions. This is the first and most obvious benefit of caching.
Some will point out that this is only a benefit if the same content is viewed a second time by a different user. If this does not occur, it may be argued, caching would be of no benefit. However, repeat visits to the same Web sites are more frequent than one may think. A recent test2 representing a small business setup with a LAN comprising about 30 computers revealed that up to 70% of content delivered during any one hour period came from the cache, and as little as 30% came from the Internet. An independent laboratory study3 recently showed average response times reduced by 87% with the use of a caching server.
There are additional benefits as well. By delivering content from its own cache, the caching server reduces bandwidth use between the LAN and the Internet. This means that more bandwidth becomes available for users requesting fresh content directly from the Internet. These users experience improved response times even if they request content that is not stored in the cache.
Ironically, there are cases where a browser may display web content from the LAN based web cache faster than from its own disk cache. Because browsers are optimized for content delivered over the network, some may actually display a page delivered over the LAN more quickly than if the same page were read from their own computer's disk.
7. What is the difference between a proxy server and a caching server
Most proxy servers are also caching servers, and many caching servers are also proxy servers. However, the term "proxy" and the term "caching" indicate two entirely different concepts.
A proxy is an agent that acts on behalf of another. In Internet technology this simply means that the proxy server is connected to the Internet on behalf of the end user's computer. The end user's computer is not in fact connected to the Internet at all, but rather to the proxy server. The distinction may appear trivial, but the implications are not. If the users on a LAN are not directly connected to the Internet, then they do not need public IP addresses. Further, they are isolated from intentional hostile attacks from the Internet, as the proxy server acts like a firewall. A whole LAN can use certain Internet services with only one official public address. The disadvantage is that all of the end user, or "client" computers must be individually and specifically configured to use the proxy server. If the proxy server is moved or renamed, all of the client computers must be reconfigured. Fortunately, the use of a proxy server is not the only way to achieve the benefits of sharing an IP address or firewall protection. The Vicomsoft KnowledgeShare document on Network Address Translation addresses this subject in greater detail.
A cache on the other hand is a device that temporarily stores copies of information. Most proxy servers are also caching servers since this is a logical place to put a cache. However, a cache need not be a proxy, and there are products on the market that give end users the benefits of LAN based caching, Internet connection sharing and firewall protection without the inconveniences of proxy client configuration. If one is seeking the benefits of caching, as opposed to the benefits of proxy, it is important to remember not to restrict one's research to proxy servers. The results of such research may be disappointing.
8. What exactly does a web caching server store copies of?
This question is not as simplistic as one may think. A web caching server, unsurprisingly, stores copies of web pages. A web page however is made up of many elements. First, there is the HTML code that describes the page. If you want to know what HTML code looks like, go to a web page with your browser and choose "View Page Source" from the appropriate menu. This will open a text window with the HTML that describes the page. But this is only a very small part of what makes up a page. There may be many graphics, photos and illustrations that each may occupy kilobytes or megabytes of disk space. There may also be animation and Java scripts to instruct the browser to perform specific operations. Each one of these is a separate element and must be requested separately from the hosting server. Viewing the source code of a popular news site revealed 36 graphical elements from five different servers in addition to the server hosting the web page itself. For the technically minded, it may interest you to know that if that particular news page were cached, each of its individual elements would be stored separately as a disk file or as an object in a database.
9. Does LAN based caching improve DNS lookup response times?
Before a web browser can fetch a page, it must perform a DNS Lookup for each domain referenced on the web page. All of the content on a web page does not necessarily come from the same server. This is obviously the case for banner advertising that comes from an ad agency. In the above example the web browser would need to fetch information from six different servers. This would require six DNS lookups. In addition to the time it takes for a browser to fetch the page content, an additional delay is needed for each DNS lookup. Some advanced web caching servers will also cache DNS information. This effectively eliminates delays due to DNS lookups.
10. Do I still need caching if I have a high bandwidth connection such as cable modem, ADSL or T1?
It is a commonly held belief among users of modem Internet connections that by upgrading to a cable modem or ADSL connection, all of their bandwidth worries will disappear. Nothing could be further from the truth. Let us consider what happens when a user clicks on a hyperlink or types a URL in their browser:
- The browser performs a DNS lookup which in and of itself can take several seconds.
- The DNS lookup returns a numerical IP address that the browser requires in order to find the web page indicated by the hyperlink.
- The browser then sends a request to its own gateway or router.
- The gateway or router forwards the request to another router. This is repeated for as many routers as there are between the client and the server. When the request is forwarded, this is called a "hop". There can be up to ten or more hops in a connection. Internet congestion can cause the delay in a single hop to last for more than one second.
- The server that is hosting the web page receives the request and responds by returning the object requested by the browser.
- Step 4 is repeated in the opposite direction.
- The browser receives the object, which is essentially a list of all of the other objects that make up the web page. It then goes through the list of items comprising the web page, and for each item (there may be 10, 20, 30 or more), steps 3 to 6 are repeated, including any delays between hops. Steps 1 and 2 may also be repeated for items to be retrieved from different servers.
This is a sample traceroute illustrating the number of hops required to get to a popular web site (www.ebay.com) from a user's browser.
As can be seen, to get from the user's browser to the eBay web, a request must travel across 15 hops. This took approximately three seconds to complete.
This eBay page consists of 25 images, in addition to the actual page. Each image has to follow the same (or similar) route as the original request for the home page shown in the traceroute.
This is the list of images to make up the page:
- Image: http://pics.ebay.com/aw/pics/logo_home_tb.gif
- Image: http://pics.ebay.com/aw/pics/h_category.gif
- Image: http://pics.ebay.com/aw/pics/new.gif
- Image: http://pics.ebay.com/aw/pics/home/spacer.gif
- Image: http://pics.ebay.com/aw/pics/h_stats3.gif
- Image: http://pics.ebay.com/aw/pics/navbar/home-top.gif
- Image: http://pics.ebay.com/aw/pics/home/home_myebay_map_425.gif
- Image: http://pics.ebay.com/aw/pics/yptc_mid.gif
- Image: http://pics.ebay.com/aw/pics/sellyouritem.gif
- Image: http://pics.ebay.com/aw/pics/news_chat.gif
- Image: http://pics.ebay.com/aw/pics/ww-homepage.gif
- Image: http://pics.ebay.com/aw/pics/p_register_tb.gif
- Image: http://pics.ebay.com/aw/pics/h_feature2.gif
- Image: http://pics.ebay.com/aw/pics/more.gif
- Image: http://pics.ebay.com/aw/pics/promo/ebayvisa1.gif
- Image: http://pics.ebay.com/aw/pics/home/welcome_widget.gif
- Image: http://pics.ebay.com/aw/pics/funstuff.gif
- Image: http://pics.ebay.com/aw/pics/rosie1.gif
- Image: http://pics.ebay.com/aw/pics/promo/Cover1.jpg
- Image: http://pics.ebay.com/aw/pics/home/jpn-help.gif
- Image: http://pics.ebay.com/aw/pics/88X31INF.gif
- Image: http://pics.ebay.com/aw/pics/photonnet.gif
- Image: http://pics.ebay.com/aw/pics/usps8ac2.gif
- Image: http://pics.ebay.com/aw/pics/bbb.gif
- Image: http://pics.ebay.com/aw/pics/truste_button.gif
Please note that this is a well designed page that makes moderate use of graphics. Many popular sites have twice this many elements.
Changing from a modem to a high bandwidth connection will only improve the time taken for the first hop or two. The remaining 13 hops will take exactly the same amount of time. Consider however, what happens when a user requests cached content from a LAN based caching server:
- The browser performs a DNS lookup which in many cases is managed through DNS caching and takes a fraction of a second.
- The browser sends the request to the caching server.
- The caching server immediately returns the list of page items with no hops and no delays.
- The browser goes through the list and requests all of the items from the caching server, which are in turn delivered immediately at LAN speeds.
Many servers deliberately throttle back their throughput to prevent users from dominating their bandwidth throughput when viewing cumbersome graphics. This contributes to additional delays, and happens most frequently on sites with pages containing large elements. The larger the elements, and the more of them there are, the greater the benefit of caching. Thus, the performance gains experienced by users of web caching servers are potentially as substantial for T1 connections as they are for modem connections.
In our calculations we have not yet taken into account the distance travelled by the information. If the server is thousands of miles away and the page requested is made up of many components, the accumulated distance travelled by all of the requests and responses may add up to hundreds of thousands of miles. Signals travel at approximately 60% of the speed of light. Even with eutopic bandwidth and traffic conditions, it may take several seconds for content to travel such distances as opposed to near instantaneous LAN delivery.
Whatever the type of Internet connection, there are benefits to optimizing bandwidth usage, particularly as Internet content volumes increase.
11. Is web caching useful on a small LAN, or only on large corporate networks?
While the benefits of caching are more evident where a large number of users will potentially view the same content, studies indicate that even on LANs with as few as five users, performance gains can be impressive. This is particularly so in small focused workgroups where multiple users tend to use the same information sources.
12. Is there any real cost justification for a web caching server?
Time spent waiting for web pages to load is not used productively. If this waiting time is reduced, there are demonstrable cost savings. In addition, the bandwidth economy realized through the use of a caching server will in many cases allow a business to avoid upgrading to a higher bandwidth Internet connection. Cost savings achieved in this way can be considerable.
13. Don't caching servers encourage copyright violation?
Each time a user visits a web site, they are potentially viewing material that is protected by copyright. There is concern among certain providers of information, and among some politicians, that caching web content is tantamount to making illicit copies, and therefore a violation of copyright. It is not the purpose of this document to answer questions of a legal nature, yet users of caching servers have a legitimate concern in wishing to do the right thing. Caching servers are not intended to assist or encourage users to misuse or abuse copyrighted material. By complying with the caching directives that accompany HTML pages and the 'Conditions of Use' published by the Internet content providers, users can be sure that they are not infringing on copyrights in any way.
14. Are there any limitations that I should be aware of?
Some caching servers only cache web content. For many users this is not an issue, as the Web represents their primary if not unique use of the Internet. Some users however may require FTP or Gopher caching as well. This should be taken into account when choosing a caching server.
While web caching systems improve response times, they may also delay the retrieval of non-cached pages. This is due to the fact that the fresh content must first be stored in the disk cache then delivered to the requester. Vicomsoft solves this problem through a technique known as concurrent caching and delivery4. Other vendors may also be using similar techniques, so potential users may wish to take this into account in their evaluation criteria.
Streaming media by their nature cannot be cached. Some caching servers will not allow streaming media to pass through. Users behind such servers may find they do not have access to any streaming audio or video. Other servers will allow streaming media to pass through, but only to a single client. As soon as one user opens a connection to a streaming server, other users would be denied access to any streaming media. There are some products however that allow multiple users to access streaming media through a caching server. Vicomsoft RapidCache is among them.
Many proxy caching servers may not allow access to any type of media other than web content. Again, users behind such proxy servers may find they do not have access to email, FTP or other Internet resources. Be certain to verify the support of any and all services you require before making a decision.
15. What is the bottom line? What does Vicomsoft recommend?
The advantages of using a caching server on a LAN to optimize network efficiency are significant:
- By using a caching server, many businesses or schools may find they do not need to upgrade their Internet connection.
- Improved content delivery can add up to hours per month of time saved by knowledge workers.
- The Internet is far more pleasant to use when response times are prompt.
- By using caching servers, enterprises and schools are being good Internet citizens, and avoiding wasteful bandwidth usage.
Internet congestion is going to get worse, not better. The most optimistic forecasts concerning improvements in server hardware, router hardware and bandwidth will not permit us to believe that this will solve the global Internet traffic jam. Intelligent resource allocation is necessary. Web caching is an essential part of any such strategy.
Most vendors of web caching server software offer free trial versions. We strongly recommend you try these for yourself before making a decision.
Notes to the text
- Moore's Law, postulated in 1965 by Gordon Moore, co-founder of Intel Corp.
- Study performed using Vicomsoft InterGate on an ethernet 10Mb/s LAN comprising 32 computers, all of which were connected to the Internet through a 256Kb/s leased line.
- Source: ZD Labs.
If you have any queries about internet security then please contact us.