Category Archives: Information security strategy

Layer 7 DDoS Attack : A Web Architect Perspective

The arm race on cyber security makes protecting Internet resources harder and harder. In the past, DDoS was mostly on Layer 3 and Layer 4 but reports from various sources identified Layer 7 DDoS is the prevalent threat. The slide below from Radware explains the changes in new DDoS trend. While protection on network traffic flooding is mature, attacker shift target to application layer.

radware ddos layer7

As DDoS attack evolves and now is targeting application layer, the responsibility to protect web application is not only rest on the shoulder of CISO or network security team. It is time for web application architects to go to the front line. This article will analyse Layer 7 DDoS attacks and discuss some measures web application architects could deploy to mitigate impacts.

Types of Layer 7 DDoS Attack 

A study conducted by Arbor Network showed that while attacks on DNS and SMTP exist, the most prevalent attack vector is still on HTTP.

DDoS Attack types

Layer 7 DDoS attack is usually targeting the landing page of website or a specific URL. If an attacker successfully consume the either the bandwidth or OS resources (like socket) of the target, normal users are not able to access these resources.

A closer look at the targeted web resources

When developing protection strategy for website against Layer 7 DDoS attacks, we need to understand not all webpage are created equal. Diagram 1 shows different types of web pages. There are two ways to classify webserver resources: one is based on their access rights, the other is based on their content type. When the content is for registered users, usually there is a user authentication process which prevents unauthenticated HTTP request. Pages only accessible by authenticated users are usually not the target of DDoS attack unless the attacker pre-registered a large number of user accounts and also automated the login process. The login page which usually uses HTTPS is another websever resource that will be exploited by DDoS attackers since HTTPS handshaking is a high loading process for webserver. Public accessible content has higher risk of HTTP Flooding. The impact of HTTP Flooding is different depends on the content type. Over load these resources for a long period of time would mean the DDoS attack is successful. On the other hand, DDoS attack on static web page usually impacts the outbound bandwidth and web server resources (like http connection pool, socket, memory and CPU). Dynamic pages will generate backend database queries and it has impact on the application server and database server. Those types with red word are facing a higher risk of DDoS attack.

Web Resources types

In the above paragraphs, we established a general understanding of Layer 7 DDoS attacks on different types of web resources. Below is a discussion on how to minimize the DDoS attack impact and lower the impact of website users. These steps do not replace Layer 3 and 4 DDoS defenses and traffic filtering. However, unless your website is designed with these principles in mind, a carefully crafted Layer 7 DDoS attack could also bring down your website.

Protecting the Static Page

The most effective and also expensive way to protect static page against DDoS attack is buying services from a reputable Content Delivery Network (CDN). However, cost to run whole website on CDN may add up to a large sum. An alternative is to make uses of cloud platforms and distribute graphics, flash and JS files to one or more web servers located in another network. This method is already practiced by most high traffic web site which has dedicated server used for delivering images only.

Nowadays, most webpage sizes are over 100k bytes when all graphic, CSS and JS are loaded. Using the LOCI DDoS tool, it can easily consume all the bandwidth and web server CPU resources. In a HTTP GET Flood attack, both inbound link and outbound link will become a bottleneck. A Web developer could not do much on managing the inbound traffic, however there are ways to lower the risk of outbound link becoming a bottleneck when under DDoS attack. Web architect should monitor the inbound/outbound traffic ratio for web server network interfaces.

One simple way to increase website defense against HTTP GET flooding attack is to store images and other large size media files in another sever (either within the same data centre or in another data centre). By using another web server for delivering non-HTML static content, it helps to lower the both the CPU loading and also bandwidth consumption of the main web site. This method is similar to creating a DIY CDN. The cloud platforms and on-demand charging scheme is a excellent resources for deploying this solution. Since the graphic files are public accessible, placing on a public cloud platform will not increase data security risk. Although this is a simple solution for static page, there are a few points to note. First, in the HTML code the “height” and “width” attribute of each image should be defined. This will ensure user see a proper screen layout even when the image server is inaccessible. Secondly, when choosing a cloud platform it is best to find one that does not share the same upstream connectivity provider as your primary data centre. When DDoS attack happens, a good architecture should eliminate performance bottleneck as much as possible.

Protecting the Dynamic Page

Dynamic pages are generated based on user input and it involves business logic at application server and data queries at database server. If the web application displays large amount of data or accept user upload media files, an attacker could easily write a script and generate a large number of legitimate requests and consume both bandwidth and CPU power. DDoS defenses mechanism relies on filtering out malicious traffic via pattern recognition will not be much useful if the attack target this weakness in dynamic pages. It is the web application developer responsibility to identify high risk web pages and develop mitigation measures to prevent DDoS attack

As displaying large amount of data based on user input is getting popular, web architects should develop strategy to prevent misuses of high loading webpage, particularly when it is publicly available.

One way is to make uses a form-specific cookie and verify the HTTP request is submitted from a valid HTML form. The steps would be

  1. When the browser loads a HTML form, the web server set a cookie specific to this form. When user click on the submit button, a cookie is also set in the HTTP request.
  2. When the web server process user request, the server side logic first check if a cookie is properly set before processing the user request.
  3. If the cookie does not exist or value not correct, the server side logic should forward the request to a landing page that displaying a URL link points to the HTML form.

Below diagram show the logical flow.

dynamic ddos

The principle of this method is to establish an extra checking before executing the server side logic supporting the dynamic page, thus preserving CPU resources at the application server and database server. Most DDoS attacking tools has simple HTTP functions and is not able to process cookies. When using Wireshark to capture HTTP request from LOIC DDoS attack tool, the packets show that LOIC does not accept cookie from web sites. The similar result is obtained from HOIC tool. 

LOIC traffic

DDoS attacker may record this cookie and send out this cookie within the attack traffic, which could be easily done using a script. To compensate this, this form specific cookie should be changed regularly or following a triggering logic.

Unfinished Story

This article only shows some ways web architects could use and add DDoS defense into the DNA of web application. The methods described will increase a web application ability to survive a HTTP Flooding attacks. When attacks are more application specific, web architects should start take on DDoS defense responsibility and design systems that both secure and scalable.

[1] http://www.owasp.org/index.php/OWASP_HTTP_Post_Tool

[1] http://www.securelist.com/en/analysis/204792126/Black_DDoS

[1] http://www.ihteam.net/advisory/make-requests-through-google-servers-ddos/

Cloud Computing in Singapore Financial Industry

Cloud Computing industry is well developed in Singapore, so it is not a big surprise seeing MAS TRM guideline has a section only on Cloud Computing. Reading the document as whole, it seems MAS is accepting the fact that cloud computing is or will be part of financial industry development.

Section 5.2 Cloud Computing is group under a bigger topic which is IT Outsourcing. For banks, the uses of third party computing resources is indeed a form of outsourcing. Operationally and legally the relationship between banks and cloud services providers is not much different. 

From the text “Outsourcing can involve the provision of IT capabilities and facilities by a single third party or multiple vendors located in Singapore or abroad.” One can assume that outsourcing to overseas cloud computing is possible. The statement does not restrict Singapore data from being stored or processed abroad. This is important as most international organisation is hosting their application centrally in regional hubs. However, it does have some catches.

The TRM guideline (5.2.3 and 5.2.4) does not discuss much of the technical side of Cloud Computing, rather it stress on the importance on data governance, which include data segregation and removal of data on exit. I believe this is due to the enforcement of banking secrecy principle (more details are available on MAS website

In cloud computing setup, deleting all information related to one entity is tricky and costly. It would be possible for IaaS deployment where the data are stored in disk images. For SaaS or other data services, to identify each data owned by the exiting entity will be a daunting task ! The data schema must be able to cater for this unique requirement. Unless it is considered when the cloud service provider is developing the system, the cost to manual deleting data is going to escalate. 

 

Car Rental is more promising than ever

When capturing and storing technology are so cheap, it is tempting for Gov to store everything. In this case, car plate images.

I guess car rental business has another marketing theme to explore! Soon we will see computer rental and mobile phone rental. When trust is gone, people are willing to try extreme measures.

There is a book offers a critical review on the abundance of surveillance technology.

Critical Issues in Crime and Society : Surveillance in the Time of Insecurity.
New Brunswick, NJ, USA: Rutgers University Press 

You are being tracked. How license plate readers are being used to record Americans’ movements (ACLU, July 2013) – A little noticed surveillance technology, designed to track the movements of every passing driver, is fast proliferating on America’s streets. Automatic license plate readers, mounted on police cars or on objects like road signs and bridges, use small, high-speed cameras to photograph thousands of plates per minute. The information captured by the readers – including the license plate number, and the date, time, and location of every scan – is being collected and sometimes pooled into regional sharing systems. As a result, enormous databases of innocent motorists’ location information are growing rapidly. This information is often retained for years or even indefinitely, with few or no restrictions to protect privacy rights.

What 4 hours RTO means

In last post I mentioned an analysis done by a group of VCPs. In their ppt, one slide is worth more discussion which is the 4 hours RTO defined in MAS notice to banks.

Recovery time objective is a well established concept and has been seeing it in large scale project design documents and also procurement RFPs. Wiki has this definition “The recovery time objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.”

The reader has to distinguish between recover to full services and recover to a service level. When disaster happens, everything has to be prioritized. Not all program are the same when you have limited resources and time. We may not expect to pay telephone bill via ATM when there is serious flooding but you expect the ATM shall still let you draw money.

The slide (shown below) highlighted the time differences between event happen and disaster is declared. Due to complexity of current system and network, the time to fully assess an system malfunction may take hours. Usually the incident handling procedure will require a few clarification (if not finger pointing) until senior staff is informed about the major outage. How a bank response to outage is now a critical element in meeting MAS requirement on RTO. The authors of this slide contended that it is far less than four hours and manual steps are not going to meet this requirement. I believe they do have a point.

Will the MAS TRM requirements and notice makes 24×7 internet banking a white elephant? Let us wait until the 2104 DBS annual report and found out their cost ratio.

Image

VCPs technical analysis on the MAS Technology Risk Management guidelines.

Since Singapore MAS released the TRM guideline last month, I believe many people are studying them (including me). Big Four accounting firms are usually most active in publishing explanatory reports and article with a purpose to generate more business leads.

However, a group of Vmware certified professionals are taking the lead this time. They worked together and published a MAS TRM analysis report focusing on DR and visualization. Some of the observations are valid. The document could be found at Vmware website

 A few I like to share

  •  Process and Committee oriented. No Agile and rapid innovation. 
  • All social media sites, cloud-based storage, web-based emails are classified as “unsafe internet services”. No technical fact given to support why they they are all insecure.
  • Trust no employee :Sys Admin must be tracked.

 

 

Singapore MAS Tech Risk Guideline (TRM) – Incident Reporting-SLA

Last post discussed the complication when running multiple bank applications on the same computing platform and need to decided when to report “a relevant incident” within one hour upon discovery.

This part will discuss on how this requirement going to affect Services Level Agreements in Singapore banking IT operations. Before this MAS notice come into effect, IT operations usually design system uptime or availability requirements according to business needs. System supporting real-time financial transactions has the higher uptime requirements. Even market data feed and AML systems which are not auxiliary to financial transactions requires high availability. Infrastructure system and monitoring services are usually regarded as secondary when availability is concerned. Failure of network monitoring system will not directly impact user or cause direct financial loss.

The MAS requirement on incident reporting within one hour upon discovery will change the importance of infrastructure system and monitoring services. Although it is possible for a bank to discover data breach or system malfunction weeks after the actual event happened, it is not what this MAS notice is designed for. The one hour upon discovery requirement is based on the bank has sound and robust monitoring infrastructure. Monitoring systems will need to run with similar availability requirement as the core financial system that requires monitoring services. Real log aggregation system like ArchSight and Splunk are important tool to discover network attacks and system malfunctions.  If a bank relies on these systems to detect attacks and provide real-time intelligence, their uptime will directly impact the bank’s capability to fulfill one hour reporting upon discovery requirement. For example, when ArchSight is used to monitor 200 servers and it is down due to an error when an SQL injection attack happened. The DB server log will still record the event happened at the correct time. When the ArchSight error is fixed, it will start processing server logs and the SQL injection attack will be identified. The time discovering this attack will be much later than the server log recorded. Could the bank claim the discovery is at the later time when the ArchSight is recovered from error ? Or MAS will deem the discovery happened when DB server recorded the attack?

The actual response and judgement will need to consider specific details of each case. However, the SLA of monitoring systems will need to improve in order to show the bank is committed to meeting MAS notice.

 

 

 

 

Singapore MAS Tech Risk Guideline (TRM) – Incident Reporting

When attending a PWC Singapore meeting on new MAS guideline, there are many questions in my head regarding how the 1 hour incident reporting requirement could be fulfilled.

The requirement requires banks operating in Singapore to report to MAS within one hour when relevant incident ( security breaches and malfunction) is discovered.

There are a few levels of complexity. One is boundary of application issue. The other is SLA issue.

Most international bank system are located in multiple time zone. Trading system maybe in London and centrally managed. Singapore application is running side by side with other regions applications. If only Japan application is under attack, shall MAS be informed taking the consideration that the affected JP application is running on same hardware platform as SG? If yes, MAS will be a central info hub of security incident globally. Also with time zone issue, international banks in Singapore will need to respond global incidents and be able to decide if the incident happening in London should be reported to MAS, not to mention the one hour requirement.

Systems are no longer running localized version. Virtualization and cost saving already change the old system to centralized and shared platforms. A clear boundary could not be easily draw when a component is affected.

I believe this question is already considered by relevant parties and MAS. One possible solution is focus on whether the remote incident materially impact Singapore operation. There should be some mutual understanding between regulator and banks on how to limited the catch all possibility of incident reporting requirement. Will talk about SLA later

Singapore MAS new TRM guideline

Singapore Monetary Authority after one year consultation released a Technology Risk Management Guideline. It is a major overhaul of the last version which was published in 2008. For sure, the banking industry and banking technology changed a lot with the omnipresence of 3G network and mobile devices.

As part of my job is to implement TRM in FI, I will write out my comments and observations in the coming posts. But first let us take a 3000 feet view of this document. A few text analysis tool and visualization graphic will do the job.

The first graph is a word cloud which shows high frequency keywords. FI means Financial Institutions. Most of the words are general IT terms like data and systems. But should notice that “ensure” appears in a relative big size !!

The left graph show the three selected keywords: Ensure, Access and Recovery. The peaks of access and recovery show that although they are used often, this keyword mainly used in one particular chapter.

FI

FI

TRM key work chart