In last post I mentioned an analysis done by a group of VCPs. In their ppt, one slide is worth more discussion which is the 4 hours RTO defined in MAS notice to banks.
Recovery time objective is a well established concept and has been seeing it in large scale project design documents and also procurement RFPs. Wiki has this definition “The recovery time objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.”
The reader has to distinguish between recover to full services and recover to a service level. When disaster happens, everything has to be prioritized. Not all program are the same when you have limited resources and time. We may not expect to pay telephone bill via ATM when there is serious flooding but you expect the ATM shall still let you draw money.
The slide (shown below) highlighted the time differences between event happen and disaster is declared. Due to complexity of current system and network, the time to fully assess an system malfunction may take hours. Usually the incident handling procedure will require a few clarification (if not finger pointing) until senior staff is informed about the major outage. How a bank response to outage is now a critical element in meeting MAS requirement on RTO. The authors of this slide contended that it is far less than four hours and manual steps are not going to meet this requirement. I believe they do have a point.
Will the MAS TRM requirements and notice makes 24×7 internet banking a white elephant? Let us wait until the 2104 DBS annual report and found out their cost ratio.
Since Singapore MAS released the TRM guideline last month, I believe many people are studying them (including me). Big Four accounting firms are usually most active in publishing explanatory reports and article with a purpose to generate more business leads.
However, a group of Vmware certified professionals are taking the lead this time. They worked together and published a MAS TRM analysis report focusing on DR and visualization. Some of the observations are valid. The document could be found at Vmware website
A few I like to share
- Process and Committee oriented. No Agile and rapid innovation.
- All social media sites, cloud-based storage, web-based emails are classified as “unsafe internet services”. No technical fact given to support why they they are all insecure.
- Trust no employee :Sys Admin must be tracked.
Last post discussed the complication when running multiple bank applications on the same computing platform and need to decided when to report “a relevant incident” within one hour upon discovery.
This part will discuss on how this requirement going to affect Services Level Agreements in Singapore banking IT operations. Before this MAS notice come into effect, IT operations usually design system uptime or availability requirements according to business needs. System supporting real-time financial transactions has the higher uptime requirements. Even market data feed and AML systems which are not auxiliary to financial transactions requires high availability. Infrastructure system and monitoring services are usually regarded as secondary when availability is concerned. Failure of network monitoring system will not directly impact user or cause direct financial loss.
The MAS requirement on incident reporting within one hour upon discovery will change the importance of infrastructure system and monitoring services. Although it is possible for a bank to discover data breach or system malfunction weeks after the actual event happened, it is not what this MAS notice is designed for. The one hour upon discovery requirement is based on the bank has sound and robust monitoring infrastructure. Monitoring systems will need to run with similar availability requirement as the core financial system that requires monitoring services. Real log aggregation system like ArchSight and Splunk are important tool to discover network attacks and system malfunctions. If a bank relies on these systems to detect attacks and provide real-time intelligence, their uptime will directly impact the bank’s capability to fulfill one hour reporting upon discovery requirement. For example, when ArchSight is used to monitor 200 servers and it is down due to an error when an SQL injection attack happened. The DB server log will still record the event happened at the correct time. When the ArchSight error is fixed, it will start processing server logs and the SQL injection attack will be identified. The time discovering this attack will be much later than the server log recorded. Could the bank claim the discovery is at the later time when the ArchSight is recovered from error ? Or MAS will deem the discovery happened when DB server recorded the attack?
The actual response and judgement will need to consider specific details of each case. However, the SLA of monitoring systems will need to improve in order to show the bank is committed to meeting MAS notice.
When attending a PWC Singapore meeting on new MAS guideline, there are many questions in my head regarding how the 1 hour incident reporting requirement could be fulfilled.
The requirement requires banks operating in Singapore to report to MAS within one hour when relevant incident ( security breaches and malfunction) is discovered.
There are a few levels of complexity. One is boundary of application issue. The other is SLA issue.
Most international bank system are located in multiple time zone. Trading system maybe in London and centrally managed. Singapore application is running side by side with other regions applications. If only Japan application is under attack, shall MAS be informed taking the consideration that the affected JP application is running on same hardware platform as SG? If yes, MAS will be a central info hub of security incident globally. Also with time zone issue, international banks in Singapore will need to respond global incidents and be able to decide if the incident happening in London should be reported to MAS, not to mention the one hour requirement.
Systems are no longer running localized version. Virtualization and cost saving already change the old system to centralized and shared platforms. A clear boundary could not be easily draw when a component is affected.
I believe this question is already considered by relevant parties and MAS. One possible solution is focus on whether the remote incident materially impact Singapore operation. There should be some mutual understanding between regulator and banks on how to limited the catch all possibility of incident reporting requirement. Will talk about SLA later
Reader of the new TRM guideline from Singapore Monetary Authority will be surprised by the changes it made. It is not an simple update but a major rewrite of some of the sections. Also it incorporated key and fundamental changes in financial technology.
At the introduction section, the author set the tune for the whole document by stating that IT is no a cost center only and should be integrated with business strategies. This type of statement is advocated by vendors for a long time but I believe it is the first time a banking regulator making the same statement in a TRM guideline. From here, the reader could expect TRM function is not only about system vulnerability or malware, project risk, governance and outsourcing are also important.
Para 1.0.1 “IT is no longer a support function within a financial institution (“FI”) but a key enabler for business strategies”
The author also states user are more IT-savvy. from my experiences, the more accurate adjective would are user are getting more IT-demanding and require more features. Usability of non-financial internet and mobile applications has revolutionized by the uses of HTML5, AJAX and even 3D graphic. Users are demanding the old html only Internet banking to follow. MAS also sense these changes and urge banks to fully understand the risk before bending over backwards to please users.
Para 1.0.3 “FIs are also faced with the challenge of keeping pace with the needs and preferences of consumers who are getting more IT-savvy and switching to internet and mobile devices for financial services, given their speed, convenience and ease of use.”
Singapore Monetary Authority after one year consultation released a Technology Risk Management Guideline. It is a major overhaul of the last version which was published in 2008. For sure, the banking industry and banking technology changed a lot with the omnipresence of 3G network and mobile devices.
As part of my job is to implement TRM in FI, I will write out my comments and observations in the coming posts. But first let us take a 3000 feet view of this document. A few text analysis tool and visualization graphic will do the job.
The first graph is a word cloud which shows high frequency keywords. FI means Financial Institutions. Most of the words are general IT terms like data and systems. But should notice that “ensure” appears in a relative big size !!
The left graph show the three selected keywords: Ensure, Access and Recovery. The peaks of access and recovery show that although they are used often, this keyword mainly used in one particular chapter.
If you are the CISO of your organization and implementing a security programme, what questions shall you ask yourself to help realizing a successful programme rollout ? No, it is not about what software to use, what hardware to install, what process to put in place or even what vulnerabilities you are going to remediate or mitigate. In fact, they are:
- Are we doing the right things ?
- Are we doing them the right way ?
- Are we getting them done well ?
- Are we getting the benefits ?
Four simple questions about your security programme, all about the business results – but not technology, schedule, and resources. Four questions about the reality such that your company can make informed decision. In addition, each of the four questions can be further elaborated, for examples:
Are we doing the right things ?
- What technology, processes are proposed ?
- For what business outcome ?
- How do the deliverables within the programme contribute ?
Are we doing them the right way ?
- How will it be done ?
- What is being done to ensure that it will fit with other current or future capabilities ? (e.g. Business / Operational / Technical capabilities)
Are we getting them done well ?
- What is the plan for doing the work ?
- What resources and funds are needed ?
Are we getting the benefits ?
- How will the benefits be delivered ?
- What is the value of the security programme ?
You shall answer all the questions based on relevant, current accurate business-focussed information. By that time, I am sure, you will find that to have a successful security programme, it is no longer depending on the technology, process and policy only, but also an investment that has an enormous impact on creating and sustain business value.