Web application performance and scalability (2005)
« Fundamental performance and scalability concepts | Key performance and scalability technologies » |
A web application's limited resources
No matter what programming language or platform you choose to develop web applications, once placed in production, a web application needs to work with a series of resources that will dictate its performance, the main ones being:
- Bandwidth
- CPU
- Memory
- I/O capacity
Like any other resource, you can buy any of these last resources at different price points, use them effectively or squander them, as well as -- irrespective of cost -- acquired them relatively easily or be limited to certain constraints imposed by the platform or server on which a web application is running.
You can think of these resources in the same way you assign resources to something as trivial as an event. You plan an event (web application) that could equally be enjoyed by an unlimited amount of people. However, since an event's resources are also limited, you need to predetermine an initial number of tables, chairs, drinks and food (Bandwidth, CPU, memory and I/O capacity) to accommodate the initial guest list.
Like most events, accommodating the exact number of resources is often a best estimate, so to with web applications. If an event (web application) is not that sought after, then you will have unused tables, chairs, drinks and food (Bandwidth, CPU, memory and I/O capacity) which will cause them to go unused incurring in an unnecessary cost, albeit often warranted in case more guests arrive.
If the event (web application) is a run-away success, then the event will need more tables, chairs, drinks and food (Bandwidth, CPU, memory and I/O capacity) to accommodate the unforeseen demand or risk a lower-quality experience for all guests (web application users). The process for accommodating more guest though, can be as simple as setting more tables, chairs, drinks and food into empty spaces (clicking on an administrative console) to changing the event (web application) to a new venue, revamping the entertainment stage or other not so quickly attained changes.
As you can attest, allocating resources for a web application represents a dilemma. Allocate too much and these resources will go unused at a sunk-cost, allocate too little and application users will suffer in terms of low performance or inaccessibility. By the same token, if left unchecked a web application's resources can easily be consumed by a few dozen users, resources that might otherwise serve hundreds of users if you take the necessary performance steps.
There are however a series of steps you can take in a web application's design and production environment to make sure these resources aren't squandered. In the following sections, I will describe these resources and their characteristics, as well as high-level descriptions of the strategies used to optimize their use.
Cloud computing services & resources | |
---|---|
One of the main advantages of using cloud computing services -- such as Amazon's EC2, Google's App Engine or Microsoft's Azure -- is that they offer one of the most flexible and cost-effective choices to provision a web application with these resources. Adding bandwidth, CPU, memory is as simple as a mouse click and as cost-effective as paying only what your web application consumes (i.e. no plans or subscriptions that go unused). Though I/O quotas are less distinctive than traditional web application hosting providers. |
Bandwidth
Bandwidth is a term used to describe the data transmission rate between two end points. Bandwidth as it affects a web application's performance is critical in two areas, one between a web application's server and upper-tier provider and the other between a web application user's PC and his Internet Service Provider(ISP).
As a web application designer, there is little you can do about the available bandwidth between a user's PC and his ISP. After all, this is what an end-user, either in a residential area or corporate office has contracted. It's what he is willing to pay his service provider to experience applications on the web.
Under such circumstances, you are left with performing a bandwidth test upon a user visiting a web application. With the results being used to either warn a user his bandwidth does not meet a certain threshold to experience a web application the way it was designed to or offer him an alternate application (a.k.a limited bandwidth application version) that performs appropriately even with limited bandwidth availability.
What is a limited bandwidth application version ? | |
---|---|
Many web applications that need ample bandwidth are sometimes offered in limited bandwidth versions. Web applications that contain many media elements like videos, Adobe Flash or high-resolution images are often considered high bandwidth applications. To limit bandwidth consumption and allow users a better navigation experience, you remove these type of media elements and inclusively other non-essential elements like Cascading Style Sheets(CSS) or JavaScript from web application. Presenting a bare-bones version of the application in simple HTML to an end-user. One of the drawbacks of having a limited bandwidth application is that you have to keep it in-sync with the same content as the original (high bandwidth) application. A process that can turn impractical for many projects. |
Another kind of bandwidth that is important is the one between a web application's server and upper-tier provider. Unlike the bandwidth between a user's PC and ISP, the bandwidth between a server and upper-tier provider is something a web application designer can control, since it forms part of the data center or provider's terms of service where a web application resides.
This type of bandwidth is important in two forms: its peak availability and the amount consumed.
Peak availability is the amount of bandwidth available at a certain point in time. This ratio varies by the data center and network topology used by each service provider. Typical ratios range from 10 MiBps(Mebi bytes per second) to 10 GiBps(Gibi bytes per second) and often even greater than this.
There is an important distinction in terms bandwidth peak availability. One bandwidth peak is the server node on which a web application resides and the other bandwidth peak the data center of the service provider.
For example, if a server node resides on a 10 MiBps network, it doesn't matter if the service provider's data center has capacities at or over the 10 GiBps range. A web application's performance will suffer upon reaching the 10 MiBps bandwidth peak, since the server node is not able to accommodate greater levels -- even though the data center might. Here its important to get a clear-cut answer from your service provider on the type of network node used to host a web application, for many service providers just advertise their overall data center capacity.
In case a web application does reach bandwidth peaks, you will either need to split it into multiple server nodes or move it to a larger server node to accommodate larger bandwidth peaks -- or inclusively another service provider if one is not capable of accommodating a web application's bandwidth peaks. Using a larger server node or multiple server nodes are topics covered in greater detail in the chpater Performance and scalability techniques 101 sections on vertical and horizontal scaling.
Generally speaking, its rare for a web application to topple bandwidth peaks, but they can occur if a web application gets a sudden influx of visitors. Table 2-1 has a comparative list of bandwidth peaks, application sizes and the amount of visitors it would take at any given second to saturate.
Table 2-1 - Bandwidth peak, application size and approximate visitors per second to saturate.
Bandwidth Peak | Page Size(e.g. Home Page) | Approximate visitors per second saturate |
---|---|---|
10 MiBps | 50 Kibibytes | ~204 visitors per second |
10 MiBps | 512 Kibibytes | ~20 visitors per second |
1 GiBps | 50 Kibibytes | ~20,971 visitors per second |
1 GiBps | 512 Kibibytes | ~2,048 visitors per second |
10 GiBps | 50 Kibibytes | ~209,710 visitors per second |
10 GiBps | 512 Kibibytes | ~20,480 visitors per second |
As you can see, even for a moderately large web application home page of 512 KiB(Kibibytes) -- which could be media of some kind -- hosted on the lowest 10 MiBps bandwidth peak, it can still accommodate about 20 visitors per second, which for a many web applications is a large number.
Nevertheless, if a web application attracts enough attention it can easily surpass these bandwidth peaks. Sudden attention could be due to coverage of a web application on a highly visited website (e.g. Google, Yahoo, Slashdot), a televised event (e.g. National news or sporting event) to possibly a Denial of Service(DoS) attack, the last of which is a malicious effort to mimic visitors with the intent to make a web application inaccessible.
It's not strange for highly visited websites to refer hundreds of visitors per second to a web application, as its also not strange for DoS attacks to reach thousands of hits per second.
However, don't worry too much about bandwidth peaks yet. It could be 'cold comfort' to know that before a web application's performance suffers from reaching a server's or data center bandwidth peak, it's likely to suffer first from a lack of other resources.
Take the earlier 20 visitor per second scenario. Even peaking at such a number of visits, it's still assumed that a web application will be able to process that many visits and have enough memory, CPU and I/O capacity to do so. It can be the case that a web application's performance starts suffering at 5 visits per second, well before the bandwidth peak of 20 visitors per second, due to a lack of some of these other resources on a server.
Throttling is one technique used to limit bandwidth peaks, though its only controllable at the network level and hence by a service provider. Throttling is further discussed at the end of this section.
In addition to bandwidth peaks, bandwidth consumption is another important topic. Unlike the PC to ISP bandwidth scenario charged on a flat-fee basis irrespective of the bandwidth consumed, server to upper-tier bandwidth consumption is tallied and charged in units.
Bandwidth consumption can become a performance concern since exceeding a certain threshold can either shutdown your service or cost you financially more than your first estimates. Since even a mildly visited application can be prone to bandwidth consumption limits, table 2-2 has a comparative list of application sizes and the amount of visitors it would take to reach a certain bandwidth quota.
Table 2-2 - Bandwidth quotas, application size and approximate visitors per month to saturate.
Bandwidth quota | Average visitor consumption(e.g. Home Page & 3 additional page average) | Approximate visitors to reach quota |
---|---|---|
200 GiB per month | 512 Kibibytes | ~409,600 visitors per month |
200 GiB per month | 10,240 Kibibytes | ~20,480 visitors per month |
500 GiB per month | 512 Kibibytes | ~1,024,000 visitors per month |
500 GiB per month | 10,240 Kibibytes | ~51,200 visitors per month |
1000 GiB per month | 512 Kibibytes | ~2,048,000 visitors per month |
1000 GiB per month | 10,240 Kibibytes | ~102,400 visitors per month |
Bandwidth consumption is cumulative, which is why it can grow gradually without any warning or apparent lack of other resources. For example, the lowest quota of 200 GiB per month with an average visitor consumption of 10,240 KiB requires about 20,480 visitors per month for total consumption, which is about 660 visitors a day. Just to give you an idea, a 35 second MP3 video is approximately 3,072 KiB or 3 MiB, so a web application that relies on media can easily reach this average visitor consumption.
Bear in mind this is average bandwidth consumption. Visitors might like your application a lot and exceed this average, lowering the number of visitors even further before reaching the bandwidth quota. On the other hand, visitors may not consume this much bandwidth and leave you with a higher ratio of visitors before reaching the bandwidth quota.
Similar to bandwidth peaks, a web application can reach bandwidth quotas due to growing attention on a highly visited website (e.g. Google, Yahoo, Slashdot), a televised event (e.g.National news or sporting event) or possibly search engine crawlers, the last of which are programs designed to retrieve a web application's data to classify and rank it on search engines.
Its not strange for highly visited websites to refer hundreds of visitors per month to a web application, as its also not strange for web crawlers to visit a web application daily for possible updates.
Depending on your providers terms of service, bandwidth consumption is charged in one of three forms:
- Fixed rate.- A service plan is given a fixed bandwidth quota (e.g. 200 GiB per month) and any consumption beyond this limit is charged at a fixed rate. This is the most common method used by service providers.
- Burstable.- A service plan is given a fixed bandwidth quota (e.g. 5,000 GiB per month), however, bandwidth excesses are permitted without incurring in any financial penalty if they occur within certain time periods. This method is primarily offered to large customers as an added incentive. The mechanism for determining what makes up burstable can vary by provider. See: http://en.wikipedia.org/wiki/Burstable_billing
- Pay as you use.- A service plan's bandwidth is charged per unit (e.g. 1 GiB) irrespective of time span. This is the newest method offered by most service providers, especially those in the cloud computing landscape. Note that even though this type of plan is apparently more cost efficient, most service providers in this space also scope other resource in this way (e.g. server uptime per hour) which may or may not result in a cost-effective solution for your application. (Individual prices for all resources vs. averaged prices for all resources).
Besides the previous bandwidth criteria you should take into account in web applications, there is also another set of bandwidth issues you need to be aware of, albeit these are further from your control.
Even though bandwidth can have healthy measurements at a PC to ISP endpoint, as well as server to upper-tier provider, other issues can affect a web application's bandwidth in between such points. Given that the route traveled by a web application's data can go through various networks or 'hops' as is often referred in network lingo.
One of the first conditions is caching, which involves maintaining a copy of a web application's data instead of fetching it from its source, thus avoiding bandwidth consumption between networks. This is important since all service providers -- with the rare exception of end-user subscribers (PC to ISP) -- are subject to bandwidth quotas. This means caching could be used as a strategy to cut down costs. So besides your caching strategies to save server resources -- many of which will be explored throughout the book -- caching can also be used at many levels in a network.
Service providers often rely on gateway proxies to execute their caching strategy. A gateway proxy serves as a placeholder for a web application's data, effectively appearing as a web application's origin for multiple users, in turn reducing bandwidth consumption. Given the multiple players involved in deliverying web applications, gateway proxies could be found at several points along a web application's delivery path.
In the case of a local ISP, its bandwidth costs are dependant on a regional service provider, and a regional service provider costs are dependant on a national service provider. Similarly, a hosting provider is subject to a regional provider's bandwidth tariffs, as this last one will also be to a national provider's quotas. Each of these providers can make us of bandwidth saving strategies on their own -- via gateway proxies -- irrespective of those you can use on your applications.
Though web applications containing time sensitive data can include information to avoid being cached in gateway proxies. Its not unknown for certain service providers to incur in surreptitious caching to cut down bandwidth costs, with web application providers and end users being unaware of stale data delivery.
Another strategy related to bandwidth is called mirroring. Similar in principal to gateway proxies, mirrors function as copies of a web application placed in strategic geographical points. However, unlike gateway proxies, mirrors are under the control of a web application creator (i.e. you).
If a web application has a large user base on a worldwide scale, mirroring can also help reduce a web application's latency by having multiple copies in data centers around the globe. With user requests originating in Europe processed by a mirror in Europe, user requests originating in Asia processed by a mirror in Asia and so on.
Mirrors offer a 'quick and dirty' way of increasing a web application's scalability, since various locations attend to an application's load. However, if a web application has dynamic content, mirrors can quickly become problematic to support. Web application's based on static content are generally the best candidates for mirroring. It's also worth mentioning there are some companies which specialize in mirroring more sophisticated applications, often classified as content delivery networks, one of the leaders in this segment is Akamai .
Another technique related to bandwidth and used by some service providers is called throttling. Similar to caching which can have beneficial or adverse consequences, the purpose of throttling is to limit the amount of requests crossing a network.
A service provider may decide to 'throttle' requests for a number of reasons, which can include reducing bandwidth consumption to simply limiting network load. Thus if a web application crosses a network that is employing throttling, performance is likely to be affected, without web application providers or end users realizing this is the cause.
As you will realize, when it comes to caching via gateway proxies or throttling by third-party providers, there is little you can do. However, many of these issues are at the heart of many net neutrality topics, which is why there is ongoing work to avoid such practices.
Measurement Lab is one initiative in this front. Providing a series of resources and tools to help you determine if a web application's performance suffers from broader network issues which are beyond your control. You can find an introduction on the purpose of MLab by Vint Cerf -- Google's Chief Internet Evangelist -- here: http://googleblog.blogspot.com/2009/01/introducing-measurement-lab.html
Why is bandwidth more important at server and PC points ? | |
---|---|
Bandwidth is generally aggregated from low data transmissions rates or 'pipes' into higher data transmission rates or 'pipes'. The often designated term 'pipe' -- used as an analogy to water or gas pipes -- is descriptive of why bandwidth availability is more critical at server and PC end points. Water in your home comes from a small pipe, which in itself comes from a bigger pipe in your neighborhood, which in itself comes from a county pipe, and so on until you reach a reservoir. Similarly, water reaches a reservoir in the same escalated fashion, from individual rain drops, which then make up a stream, which are then accumulated into a reservoir. Therefore, it's both a web application's originating end point (server) and final delivery end point (PC) which are the biggest bandwidth bottlenecks. Since once data crosses these points, higher data transmission rates are almost always the norm. Following the water analogy. A reservoir is analogous to the Internet's backbone, a moniker used to describe the interconnections of the largest Internet providers. In case a failure occurs in a provider's delivery route, other interconnected providers can fulfill delivery so the data continues in its path without failed delivery. See the previous paragraphs on throttling and caching via gateway proxies for exceptions to this behavior. |
Bandwidth bottlenecks and performance strategies
Web applications will always make us of bandwidth, but there are certain bottlenecks that consume large amounts relative to other parts of a web application. The following is a list of bandwidth bottlenecks:
- Videos/Media.- Web application's that rely on dynamic content like videos or media such as Adobe Flash are often bandwidth bottlenecks. Though they offer an excellent means of 'eye-candy' for applications, if bandwidth is a concern you should use them sparingly.
- High resolution images.- Though images are now a common occurrence in web applications. High-resolution images or 'out-of-the-box' images created on a desktop can become major bandwidth consumers.
- SSL/HTTPS.- Heightening security on a web application using Secure Socket Layer(SSL) will always represent an administrative cost. But the use of SSL/HTTPS can also increment bandwidth consumption versus the standard non-secure HTTP protocol.
- Web crawlers/ DoS attacks.- Search engines are integral to the web and often a necessary ingredient for a greater audience to discover a web application. However, they also have inherent bandwidth costs. Since search engines can do 'deep' crawls occasionally -- read every single page composing a web application -- they can represent the bandwidth consumption quota of dozens or hundreds of users. DoS attacks are another factor that can put a heavy toll on bandwidth consumption, albeit this is a rogue type of event.
Some strategies to solve these bandwidth bottlenecks are the following:
- Application HTTP Headers.- Incorporated into a web application's response, HTTP headers are small data fragments used by end-user browsers or intermediate service providers to avoid constantly requesting a web application's data . They offer a series of flags for transmitting a web application's data in an efficient way.
- Video/Image compression.- Though you can use a large variety of video and image formats in web applications, there are some better suited to reduce bandwidth consumption, with the added benefit of faster navigation times for end users.
- Caching.- If a web application is relatively static and is requested on a constant basis, you may decide to use a caching strategy to minimize bandwidth consumption. Caching for effects of bandwidth consumption is achieved through HTTP headers.
- Mirroring.- Can be used to distribute copies of a web application across geographical points to reduce latency and large server loads. This strategy is more effective for application's consisting of static content.
- Throttling.- Can be used to limit the number of incoming requests on a web application, thus minimizing bandwidth consumption and bandwidth peaks within certain ranges. This strategy can only be achieved with cooperation from your service provider.
- robots.txt.- Used to indicate access instructions for web crawlers. This can reduce bandwidth consumption by a considerable amount, albeit with the drawback of certain sections of a web application not being classified by search engines.
- Blocking Internet Addresses(IPs) per country/region.- Not an entirely accurate approach, but a web application could be blocked from being accessed in certain countries/regions based on a user's IP address, as a result reducing the amount of consumed bandwidth. This strategy is often used by providers of applications that rely on large amounts of Video/Media, albeit with the drawback of not being able to reach a worldwide audience.
In the next section, I will describe another resource that also plays an important role in web application performance.
Memory
In the most general terms, memory is a medium that allows data to be maintained in a place other than its origin and in closer proximity to where it's used. In a computer's architecture there are many types of memory, but for our discussion I will concentrate on one type of memory first: physical memory, also commonly called Random Access Memory(RAM).
Every piece of data used in an operating system(OS) relies on RAM. From the OS itself, a web application's core packages (e.g. Java, .NET, PHP, Ruby,Python), permanent storage system, web server, application data and anything else that resides on a hard drive.
Because a hard drive is a slower medium to get access to than RAM, an OS by design places data in RAM to enhance access speeds and thus performance. I won't dive into the particularities of motherboards, buses and other hardware related topics, but data residing in RAM is in closer proximity to things like a CPU for processing and a web server for dispatching applications out to end users, than it is residing on a hard drive.
This makes physical memory or RAM a highly coveted resource in any system. At any given moment, tens or hundreds of applications and their data are vying for a piece of RAM on a system. The issue is that the amount of data that can live in RAM is miniscule compared to the amount of data that is often needed to run applications and their data. For example, table 2-3 has a list of OSs and their RAM limits.
Table 2-3 Operating systems and RAM-physical memory limits.
Operating System | OS type | Physical memory limits |
---|---|---|
Windows Server 2008 Standard | 32-bits | 4GB |
Windows Server 2008 Standard | 64-bits | 32GB |
Linux | 32-bits | 1GB~4GB |
Linux | 64-bits | 4GB~32GB |
Sources physical memory - Memory Limits for Windows Releases & |
Although there are operating systems that can handle more RAM, the information in table 2-3 is telling in two forms. First, it shows a relatively low limit on the amount of data that can live in RAM. As a web application grows, a limit like 4GB can easily be reached. A relatively large database or a large user base that accesses a web application simultaneously, can make it close to impossible to hold a web application and its data entirely in only 4GB RAM.
Similar to bandwidth peaks, when a web application topples an OS's physical memory limits, it's necessary to use other performance strategies such as vertical or horizontal scaling, topics discussed in an upcoming chapter.
Is the 4GB physical memory limit on 32-bit OSs real ? | |
---|---|
The 4GB physical memory limit on 32-bit OSs comes from the 32-bit state limit of 16,777,216 or 2^32 records capable of being addressed by an OS. See table 1-1 bit ranges and states in Chapter 1 for more on this subject. Even though 4GB is a widely thought of limit for 32-bit OS, there are workarounds for effectively using more than 4GB of physical memory on 32-bit OS, all of which depend on the OS, motherboard, chipset and other factors like support for Physical Address Extension . Since these topics are beyond the scope of our discussion, I will point you to two good sources on the matter : |
However, unlike bandwidth limits, an OS can deal with physical memory exhaustion or shortages making use of a strategy called 'virtual memory'. As you can probably imply by its name, virtual memory is an attempt to mimic the properties of a system's physical memory(RAM). However, it's just that -- an attempt -- for it doesn't have the speed or efficiency of physical memory and it inclusively resides on a hard drive, hence its virtual prefix.
On Windows OSs, virtual memory is a file named WIN386.SWP or PAGEFILE.SYS -- depending on the Windows version -- located on the first hard drive partition(C:\) of the OS. On Linux/Unix OSs, virtual memory are hard-drive partitions designated with a swap type file-systems.
Some Linux/Unix OSs can also use a file for virtual memory purposes just like Windows. Nevertheless, swap file-systems are known to have better performance. See http://www.go2linux.org/swap-file-vs-swap-partition for more details. |
Though an OS's virtual memory size and certain behaviours (e.g. dynamically resizing on Windows OSs) are configurable, an OS still determines what applications make use of virtual memory. On most OSs virtual memory is an integral part of a system just like physical memory(RAM). However, in determining an OS's virtual memory size its important you familiarize yourself with the concepts of paging and thrashing.
When a web application receives a user request, it's loaded by the OS into physical memory in the form of pages. As more applications or data are requested by a user, the OS will continue to load pages into physical memory. Eventually, there will come a point were no more pages allowed, due to physical memory being exhausted.
At this juncture, the OS will use an algorithm to find and supplant a pre-existing page with a new page. These algorithms can include: First-in-First-out(FIFO), meaning the first page into physical memory (the oldest) be discarded; Not Recently Used(NRU), meaning the least used page in physical memory be discarded; among other algorithms predetermined by the OS.
However, the pre-existing page that is making way for a new a page is not entirely discarded, it's placed into virtual memory, where it can later be retrieved by the OS. This process of transferring a page from physical memory to virtual memory is paging. On Linux/Unix systems the term swapping is often used interchangeably with paging.
When a page selected for replacement is again referenced by a user request, it's paged back into physical memory. At the same time, another page in physical memory is paged out to virtual memory to make way for the newly re-referenced page. And so this process repeats itself as applications and data are requested by a user.
Its already been mentioned that virtual memory is on a hard drive, thus the process of paging involves a certain amount of resources(I/O). If the process of paging requires an increasing number of resources and produces decreasing results, an OS incurs in thrashing.
For example, running a web application with a lesser amount of physical memory than recommended by its vendor is surely to incur in thrashing. Under such circumstances, upon loading a web application an OS will immediately incur in the process of paging, since it isn't able to keep the minimum set of data(pages) recommended by a vendor in physical memory. An OS will then page out and page back in pages as they are needed, but since the minimum set of pages to run a web application are never able to fit in physical memory, paging will occur to the point of taking an increasing number of resources with an overall decrease in performance (i.e. thrashing occurs).
similar situation can occur if many applications run simultaneously on an OS. Each application requires a certain amount of physical memory to run at an acceptable level, but if the sum of physical memory required by all applications surpasses the amount available on an OS, applications start paging. Resulting in reduced performance (i.e. thrashing occurs).
Ultimately, the best solution to avoid thrashing is to reduce an OS's reliance on virtual memory, which is done by either running fewer applications on an OS or installing more physical memory. Both these approaches are described in the chapter Performance and scalability techniques 101 .
Can web applications be forced to use a certain amount of memory ? | |
---|---|
Certain applications such as databases and platforms like Java have configuration options to use specific amounts of memory for performance purposes. This gives assignment priority to physical memory. With the OS first attempting to assign physical memory to these applications instead of others. However, even with explicit configuration options, if physical memory is still exhausted a web application will undoubtedly fall back to use virtual memory. |
Memory bottlenecks and performance strategies
Just like bandwidth, there are certain conditions that if present in a web application can exacerbate the use of memory, detrimenting a web application's and OS's overall performance. Some of these conditions are the following:
- Memory leaks.- Certain platforms/languages(e.g. C and C++) require that memory be allocated manually in the structure of a web application. This can lead to the mis-allocation of memory that is never released by a web application and can never be re-used. Applications with memory leaks can cover a wide spectrum so long as they rely on such platforms/languages, these can include web servers, core platform interpreters(e.g.Ruby, Python) or custom-made applications that rely on unmanaged memory platforms/languages (e.g C and C++).
- Circular references.- Similar in principle to memory leaks, circular references1 lead to a platform/language not being able deallocate memory for re-use. The term differs from memory leaks, in the sense that circular references occur in a series of modern languages (e.g. Java and .NET) said to have memory management or garbage collection. Strictly speaking, memory leaks cannot occur since a language allocates/deallocates memory, nevertheless other patterns (i.e. circular references) in these languages can hinder the mis-allocation of memory.
- Hot spots.- Hot spots are places where a web application requires an excessive amount of memory to run, they are due to memory leaks, circular references or other faulty structures in a web application's source code.
- Platform/language optimized memory algorithms.- Every platform/language generally has many ways of performing a particular task, which can range from the trivial (e.g. concatenating a string) to the elaborate (e.g. processing an image or text file). Not using recommended best practices or algorithms can quickly add up in terms of performance and memory consumption.
- Ad-hoc application installations.- Applications are generally designed with a single purpose and optimized for a particular OS and set of resources. You should select from these platforms when possible to enhance memory usage. For example, web servers like Apache have multiple ad-hoc modules that are not ideal for high-traffic web applications -- an in-depth discussion on Apache's ad-hoc modules is found the web servers section of the Key performance and scalability technologies chapter. Another case scenario could correspond to binaries (e.g. Java) targeted for certain OSs and CPUs. The more targeted a web application's underlying parts are for a certain OS and set of resources, the better its memory performance.
Memory usage in applications can also be alleviated with the use of certain performance techniques mentioned next:
- Profilers.- Depending on the platform/language for your applications, there is surely a profiler that allows you to pin-point code sections that are causing memory leaks, circular references or hot-spots. Thus a profiler will allow you to re-factor code for better memory performance.
- Platform/language best practices.- Each platform/language has best practices to ensure better memory performance. Following these practices -- even as insignificant as they can seem individually -- can have a compounded effect in the overall memory usage of a web application.
- Optimized/targeted applications for OS.- Applications are designed for a series of OSs, CPUs and other resource combinations. You should make sure this is also the case for your applications. Equally, if a web application relies on compiled code, you should strive to compile with options designed for the target platform to ensure better performance.
Central Processing Unit (CPU)
The CPU is often considered the heart of a computer, but in recent times its underlying structure has undergone significant advances, requiring applications to also undergo design considerations to take advantage of such changes. Before describing such changes, I will start of with a brief overview of a CPU's main performance metrics.
By far the most common way to classify a CPU is by its clock-speed. Now a days common in Giga-hertz(Ghz) ranges, a higher clock-speed guarantees faster processing times. In such cases the performance implications are simple, higher clock-speeds mean faster processing and thus better performance, all other things being equal. For most CPU manufacturers, clock-speed has plateaued in the range of 3.0 to 3.2 Ghz. A limit which is mostly due to the power dissipation2 ratios in CPUs, making higher CPU clock-speeds unfeasible due to the higher power and cooling requirements.
In addition to CPU clock-speed, another performance metric often used in CPUs is its cache memory size. The use of cache memory in a CPU serves as a faster processing medium than a system's physical memory (RAM).
The first time a CPU needs data it's fetched as a page from a system's physical memory, when the data arrives, a copy is saved in the CPU's cache memory. If the CPU then needs related data from the same page, the data is already in the cache memory, which means no delay reading it into the CPU. It's worth mentioning that the rate at which data is read from or stored into a CPU's cache memory is called memory bandwidth .
A CPU's cache memory is often classified in two forms. L1 -- or Level 1 -- which is a small cache that runs at full CPU speed with low latency. And a secondary L2 -- or Level 2 -- which has a longer latency. L1 cache sizes are typically in the order of 8-64 Kibibytes, while L2 cache sizes range from 128KiB to 6MiB or more.
Similar in nature to that of physical memory and the data it can accommodate, the larger a CPU's cache memory the better performance a web application can display, since a web application's data moves closer to where it's processed. However, in the same fashion, a CPU's cache memory cannot accommodate the entire data set that would typically be stored in a system's physical memory, since a CPU would be extremely expensive.
A CPU's clock-speed and cache memory are simple metrics to dimension -- just like physical memory -- the more the better. But what happens if a single CPU's resources are not enough to serve the needs of a web application ?
On the face of it, the answer to this question seems straightforward 'Add another CPU', just like you would add more physical memory. Though this answer is correct, its only half the answer, the entire answer would be 'Add another processor and design a web application to take advantage of it'.
The ability to run multiple CPUs on a single OS is not new. However, given that multiple CPUs until recently required special hardware with extra sockets to accommodate this -- hence an extra cost -- most applications requiring this processing power were designed to take advantage of it. This has changed however with the advent of multi-core CPUs.
A multi-core CPU is a CPU with the processing capacity of two, four, eight or often more processors in the same chip. This type of CPU has become standard in almost every type of hardware in the market. A phenomenon which is often attributed to the limits of CPU clock-speeds. With CPU clock-speeds hitting a limit due to power dissipation ratios, advances in multi-core design were the next target for CPU manufacturers.
To better understand the implications of having multiple CPUs or a multi-core CPU at the service of a web application, its fundamental to understand parallelism and threads, concepts introduced in the Fundamental performance and scalability concepts chapter.
Under most circumstances, a web application is run in a serial way. To use an analogy, think of a CPU as a bank teller and yourself as the customer with multiple operations that need attending. Suddenly, another bank teller (CPU or core) becomes available, but you obviously can't just rush into the new teller to do the rest of your operations, while the first one finishes its tasks -- you can't split-up in two.
The natural result is that the next person in line will step up to this new bank teller. But in the same way, that person won't do his operations any faster,since he can't split-up in two either. A corollary to this is that multiple tellers will be able to offer better service to multiple people. Hence, multiple CPUs or a multi-core CPU will perform better if executing multiple applications. But what happens if you are running a single application and have two, four or even eight CPUs at your disposal ?
Turning back to the bank teller analogy, you arrive at the line and have over eight operations to do, no one else is in line, and there are even four teller windows available. A waste of tellers doing nothing, idle. Indeed, this is the same issue faced when running serialized applications on multiple CPUs or a multi-core CPU, idle CPU cycles.
The solution to this problem is to split your operations so they are performed in parallel, at multiple teller windows instead of one. So you gather different people and hand each one two operations. However, you suddenly notice that one person comes back with an overdraft notice, an operation didn't go through. What happened ?
It turns out that when you did the bank teller operations by yourself -- in serialized fashion -- if you made operations on the same account, you handed operations to the teller in a specific order to avoid this problem. In this case, you used parallelization, but you didn't take the necessary precautions. In fact, you could also question if performing this parallelization process -- getting more people and the overhead design of avoiding conflict -- really made the entire process faster ? Or perhaps just more convoluted ?
The same thing happens with applications and multiple-CPUs or multi-core CPUs. You need to ask yourself three basic questions before parallelizing a web application to exploit multiple-CPUs or multi-core CPUs:
- What section of a web application's business logic is best suited for parallelization ?
- What design technique will you use to avoid conflicts if you do decide to parallelize a web application ?
- Will a web application even benefit from the parallelization process ? Or will the difference be unnoticeable ?
There are certain applications that lend themselves to parallelization in order to gain from multiple-CPUs or multi-core CPUs, even more importantly, it's often parts of a web application that are more suited for parallelization than others.
You will know best to what extent a web application's code is a prime candidate for parallelization. Since you will be most familiar with the business problem and understand if certain sections are CPU-intensive, thus benefiting from harnessing the power of multiple-CPUs or multi-core CPUs in parallel. Toward the end of this section I will describe a few case scenarios involving CPU-intensive web applications, but now I will describe a more obvious case involving parallelization and CPU-intensive applications.
DNA sequences are successions of letters representative of a DNA strand, which in order to get a more comprehensive picture of living organisms are combined in the millions. Take the case of a web application designed to do such a task, that will potentially need to analyze and combine millions of records.
The process of analyzing and combining this amount of data requires large amounts of CPU cycles. So what happens when this type of application is run on an OS with a single CPU ? It will take a lot of time, at least more than if it were run on an OS equipped with two or three CPUs.
But here lies the issue, merely having two or three available CPUs is not enough for the application to finish its processing any faster. The solution to exploiting multiple-CPU is threads -- discussed in Chapter 1. In this manner, separate threads are delegated to whatever CPUs are available at any given time.
Up until a few years ago, having an OS with multiple-CPUs was a luxury to all but the highest end servers and workstation which had applications of this type. However, this has changed with the advent of multi-core processors, which offer the CPU power of two, four and often eight CPUs in a single chip. So now this capacity is not reserved for these type of scenarios, now all but the simplest of web applications can leverage multiple-CPUs by using the proper threading techniques.
Once you've decided to incorporate parallelization in a web application, you need to decide which technique you will use to avoid conflicts between the tasks (i.e threads) performing in parallel. Depending on a web application's programming language, the techniques used for this process will vary. As described in Chapter 1, you can use shared memory and non shared memory techniques.
As you will now realize, the advent of multi-core CPUs has brought the use of threads in many programming languages to the forefront. Libraries, frameworks and entire languages have become popular on the back of this CPU technological change. Part III of the book will explore a series specific programming languages, including the use of threads in shared memory and non shared memory design, as well as how best to achieve parallelism in web applications.
Hyper-threading - The earliest CPU expansion strategy | |
---|---|
Prior to the wide availability of multi-core processors, a technique called hyper-threading emerged to offer some of the same benefits. A CPU with hyper-threading enabled is treated by the OS as two processors instead of one. This means that even though a single processor is physically present, the OS sees two virtual processors and shares the workload between them. The use of hyper-threading is highly specific to the OS running applications. Some OSs do not support hyper-threading, in fact some studies have shown hyper-threading to have minimal and even detrimental impact. You can find a series of performance metrics on hyper-threading for the following OSs at the indicated links: If enabled on multi-core CPU systems, multi-threading can potentially create an even greater number of CPUs available for processing threads. For example, a quad-core system with hyper-threading enabled would have 8 CPUs (4 cores x 2 hyper-threading= 8 CPUs), similarly a four hexa-core system with hyper-threading enabled would double its capacity from 32 CPUs (4 CPUs x 8 cores) to 64 CPUs (4 CPUs x 8 cores x 2 hyper-threading). Finally, unlike physical memory or even CPU clock-speed, in which having more directly results in better performance. An incremental use in multiple CPUs or multi-core CPUs doesn't necessarily result in better performance, even if a web application uses perfectly sound parallelism design. In support of this are Amdahl's Law and Gustafson's Law mentioned next. |
Finally, no discussion on the efficient use of CPUs would be complete without mentioning Amdahl's law and Gustafson's law .
Amdahl's law models the extent to which an application can benefit from being parallelized and having multiple CPUs at its disposal. As it was mentioned earlier, parallelizing an application and it having access to multiple CPUs doesn't necessarily mean it will have better performance. Amdahl's law establishes the diminishing returns of parallelizing an application and it using multiple CPUs.
Gustafson's law is related to Amdahl's law in the sense that it also addresses efficiently parallelizing an application. However, unlike Amdahl's founded on the premise of a fixed problem size, Gustafson's law is founded on the premise of a fixed time size.
CPU bottlenecks and performance strategies
Just like bandwidth and memory, CPU utilization can increment drastically if you undertake certain tasks, some of these tasks include:
- Parsing.- Consists of analyzing or extracting elements from structured data. The act of parsing has become popular in recent times with the emergence of web services, which consist of requests and responses being brokered in XML type formats. Extracting the results or attending a web service's request require parsing, as well as accessing XML or text files can need parsing.
- Sorting or filtering.- Users often need data sets ordered or filtered in certain ways. Nearly all programming languages have some type of API or library for performing such operations, however, such actions can be very CPU intensive given a large data set.
- Searching.- Performing any type of search is generally a CPU expensive operation, especially if the data in question is not structured. Though searching is generally reserved to an RDBMSs engine, it can equally be a CPU bottleneck if performed in some other tier of a web application.
- Image processing or manipulation.- Modifying or generating image type files (e.g. GIF, JPEG, TIF,etc.) is a CPU intensive activity. Inclusively, you can add PDF type files to this category -- which though not properly images -- also require vast amounts of CPU cycles to be generated.
- Video editing or generation .- Modifying or putting together the image frames belonging to a video is also a CPU intensive operation.
- Speech recognition.- Synthesizing speech patterns is a fairly CPU intensive process, albeit one not widely used in web applications.
- Web page serving(Web servers).- The act of dispatching a web application's contents to end users also requires a considerable amount of CPU cycles, which in the aggregate -- hundreds or thousands of users -- can become a bottleneck. Albeit, most web servers are already designed with multi-threading to take advantage of whatever CPU processing cycles are available to an OS.
Similarly, just like you can use certain strategies for the purpose of limiting bandwidth and memory consumption, CPU consumption can also be enhanced by relying on the following set of strategies:
- Memory.- Incrementing physical memory (RAM). By having a larger amount of physical memory, more processed data can stay in a faster medium (i.e. RAM). In turn, sparing CPU cycles that might be needed to reprocess the same data. A limitation to this strategy is if the CPU intensive logic is time sensitive (e.g. Data from logic that changes every few seconds; the less time sensitive the data, the more effective this strategy is).
- Caching.- If certain logic starts to consume an excessive amount of CPU cycles, caching by 'pinning' data to memory for a specific amount of time using a web framework's caching strategy is another alternative. By having processed data in-memory(RAM), CPU cycles are spared. A limitation to this strategy is if the CPU intensive logic is time sensitive (e.g. Results from data that changes every few seconds; the less time sensitive the data, the more effective this strategy can be).
- Parallelize algorithms.- Refactoring a web application's code for increased parallelism can be another avenue to pursue. In addition to adding more processors, cores or activating OS hyper-threading once parallelism algorithms have been incorporated.
- Serializing.- If you face a lack of physical memory and also excessive demand for CPU cycles, you can consider saving CPU intensive logic in a serialized object in the file system. By doing so, CPU cycles are spared since a web application can read the processed data from a serialized object in the file system. A limitation to this strategy is if the CPU intensive logic is time sensitive (e.g. Results from data which changes every few seconds; the less time sensitive the data, the more effective this strategy can be).
Next, I will describe the last resource that is likely to be a matter of concern in a web application's performance and scalability.
I/O Capacity
I/O or Input/Output refers to the communication capacity between two systems or devices. In technology, the term I/O applies to a broad array of areas to indicate operations pertaining to input, output or both input & output. As such, the term I/O applies to things like network cards, keyboards, mouses, monitors, printers, programming language APIs, among many other areas. For the discussion that pertains to this book though, I/O capacity will be the input & output capacity of hard drives, with the unit of measurement being Input/Output operations per second(IOPS).
So far we've discussed that having large amounts of physical memory is an excellent way of maintaining performance. However, before an OS can make use of data in physical memory it needs to read it from a hard drive, equally, for future usage a web application is likely to write data to a hard drive. Both these read and write tasks represent I/O operations on a hard drive.
I/O plays an important role in web applications since their information is constantly saved and retrieved from hard drives. In this sense, I'm not just referring to the more obvious permanent storage systems (e.g. RDBMS) used by most web applications, but also static files (e.g. images or videos) that accompany web applications. Hence in web applications, I/O capacity is problematic in one of two areas: static files or the permanent storage system used by a web application, with the second one being of greatest concern.
Static files don't take up much CPU or memory resources in a web application, however, they can be a drain in terms of I/O capacity. Consider a logo used for a web application (e.g. Google's home page logo image). If you use a logo throughout a web application's pages for branding purposes, it will constantly be read and dispatched by a web server. Depending on demand, this can translate into hundreds or thousands of I/O operations. Similar scenarios can occur for videos or HTML files read multiple times per second.
Even though static files are a potential I/O bottleneck, from the vantage point of an application they just represent an output (i.e. read) operation on a hard drive. A greater I/O bottleneck can occur if both output (i.e. read) and input (i.e. write) operations take place, which is precisely what permanent storage systems do.
There are many choices of permanent storage systems, each with differing characteristics depending on the purpose of an application. But one characteristic every permanent storage system has in common is that of reading and writing information to ensure the posterity of an application's data.
The reading and writing of information which takes place between an application and a permanent storage system is so common that the four letter acronym CRUD -- which stands for Create-Read-Update-Delete -- is commonly used to refer to the full set of possible interactions. The ensuing problem with I/O capacity as its related to permanent storage systems is due to the complexity and amount of CRUD operations.
Lets assume a permanent storage system resides on a single hard drive. If an application suddenly requires to persist data (i.e. write or the C in CRUD) as well as read data (i.e. the R in CRUD), instructions are forwarded to the permanent storage system, which then executes both the input and output operations against the hard drive. As the amount of these operations grows, latency increases, on account a hard drive has to be re-positioned on each occasion to either read or write data from the proper location (i.e. physical position) of a hard drive. Another scenario can occur if a particular piece of data is highly dynamic and sought after by several parts of an application, this can create a concurrency problem since the same piece of data is constantly read and written to a hard drive.
In order to reduce the complexity and quantity of CRUD operations, which in turn aid in keeping an application's I/O capacity in check, you can use several strategies. These strategies range from application level performance techniques, to permanent storage technologies specifically designed for high volume CRUD operations, many of which will be described as the book progresses.
RAM Disks - Increasing I/O capacity | |
---|---|
I/O capacity has its biggest limitations on hard drives because this medium posses mechanical parts (e.g. spindle & platters), thus it's constantly re-positioned in order for it to read or write the appropriate data. Once data is read from a hard drive or prior to it being written to it, data resides in physical memory (RAM). As it was pointed out earlier, physical memory is a faster medium than hard drives. Given this fact, this also makes physical memory I/O capacity vastly superior than hard drive I/O capacity. This re-confirms what was previously said: physical memory is a highly coveted resource for any application. However, memory is an expensive resource. Maintaining an entire application's data in physical memory is an expensive proposition, that is, until RAM disks appeared in the market. RAM disks offer storage capacities similar to those of regular hard drives, with performance matching that of physical memory and yet at a fraction of the cost of regular physical memory. In essence, they represent a hybrid between hard drives and physical memory technology. However, don't think RAM disks are a panacea for increasing an application's performance. For starters, RAM disks are more expensive than regular hard drives. Also, they offer similar performance levels to using physical memory, which is not the same as saying they can supplant physical memory altogether. In addition, being specialized hardware you also need control over your application's hardware. Finally and most importantly, RAM disks are volatile just like regular physical memory. This last characteristic means that if you install a permanent storage system on a RAM disk and power is lost, all the data is lost along with it. But even with these caveats, some organizations now rely on RAM disks to enhance their application's performance. This is especially the case for an application's permanent storage system, which can greatly benefit from having its data in a higher speed access medium like physical memory (vs. a hard disk), while not 'breaking the bank' in the process. The following article has a broader overview of RAM disk technology: Ramdisks - Now We Are Talking |
I/O capacity bottlenecks and performance strategies
Unlike bandwidth, memory and CPU consumption influenced by a series of factors, I/O capacity is mainly influenced by the following two issues:
- High access patterns .- If a certain piece of data is constantly needed by a web application, this can place a severe strain on I/O capacity. This last scenario can involve static data (e.g. images, plain HTML files) or CRUD operations performed against a permanent storage system (e.g. RDBMS) in which case a hard-drive is constantly 'hit'.
- Cheap hardware .- I/O capacity more than any of the previous resources can vary highly depending on the type and manufacturer of hardware. Generally speaking, the higher an investment in storage hardware (i.e. hard-drives) the higher its performance in terms of I/O. The next chapter on key performance and scalability technologies discusses a few of these hardware related choices.
You can ease these I/O capacity problems in web application using some of the following techniques:
- Caching/Memory.- By avoiding the need to constantly access stored data on hard drives and instead rely on cached data (i.e. in memory), I/O operations are considerably reduced.
- High-performance hardware.- I/O capacity more than any of the previous resources can vary highly depending on the type and manufacturer of hardware. Generally speaking, the higher an investment in storage hardware (i.e. hard-drives) the higher its performance in terms of I/O. The next chapter on key performance and scalability technologies discusses a few of these hardware related choices.
- Read/Write strategies.- Since permanent storage systems generally do both input (i.e. write) and output (i.e. read) operations, I/O capacity can be improved by limiting certain hard drives to input operations (i.e. write) and others to output (i.e. read) operations. Though this process also requires the use of replication and synchronization to avoid data conflicts, it's a proven process. This strategy is often called limiting I/O contention.
« Fundamental performance and scalability concepts | Key performance and scalability technologies » |