Cloud Computing industry is well developed in Singapore, so it is not a big surprise seeing MAS TRM guideline has a section only on Cloud Computing. Reading the document as whole, it seems MAS is accepting the fact that cloud computing is or will be part of financial industry development.
Section 5.2 Cloud Computing is group under a bigger topic which is IT Outsourcing. For banks, the uses of third party computing resources is indeed a form of outsourcing. Operationally and legally the relationship between banks and cloud services providers is not much different.
From the text “Outsourcing can involve the provision of IT capabilities and facilities by a single third party or multiple vendors located in Singapore or abroad.” One can assume that outsourcing to overseas cloud computing is possible. The statement does not restrict Singapore data from being stored or processed abroad. This is important as most international organisation is hosting their application centrally in regional hubs. However, it does have some catches.
The TRM guideline (5.2.3 and 5.2.4) does not discuss much of the technical side of Cloud Computing, rather it stress on the importance on data governance, which include data segregation and removal of data on exit. I believe this is due to the enforcement of banking secrecy principle (more details are available on MAS website
In cloud computing setup, deleting all information related to one entity is tricky and costly. It would be possible for IaaS deployment where the data are stored in disk images. For SaaS or other data services, to identify each data owned by the exiting entity will be a daunting task ! The data schema must be able to cater for this unique requirement. Unless it is considered when the cloud service provider is developing the system, the cost to manual deleting data is going to escalate.
In last post I mentioned an analysis done by a group of VCPs. In their ppt, one slide is worth more discussion which is the 4 hours RTO defined in MAS notice to banks.
Recovery time objective is a well established concept and has been seeing it in large scale project design documents and also procurement RFPs. Wiki has this definition “The recovery time objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity.”
The reader has to distinguish between recover to full services and recover to a service level. When disaster happens, everything has to be prioritized. Not all program are the same when you have limited resources and time. We may not expect to pay telephone bill via ATM when there is serious flooding but you expect the ATM shall still let you draw money.
The slide (shown below) highlighted the time differences between event happen and disaster is declared. Due to complexity of current system and network, the time to fully assess an system malfunction may take hours. Usually the incident handling procedure will require a few clarification (if not finger pointing) until senior staff is informed about the major outage. How a bank response to outage is now a critical element in meeting MAS requirement on RTO. The authors of this slide contended that it is far less than four hours and manual steps are not going to meet this requirement. I believe they do have a point.
Will the MAS TRM requirements and notice makes 24×7 internet banking a white elephant? Let us wait until the 2104 DBS annual report and found out their cost ratio.
Since Singapore MAS released the TRM guideline last month, I believe many people are studying them (including me). Big Four accounting firms are usually most active in publishing explanatory reports and article with a purpose to generate more business leads.
However, a group of Vmware certified professionals are taking the lead this time. They worked together and published a MAS TRM analysis report focusing on DR and visualization. Some of the observations are valid. The document could be found at Vmware website
A few I like to share
- Process and Committee oriented. No Agile and rapid innovation.
- All social media sites, cloud-based storage, web-based emails are classified as “unsafe internet services”. No technical fact given to support why they they are all insecure.
- Trust no employee :Sys Admin must be tracked.
Last post discussed the complication when running multiple bank applications on the same computing platform and need to decided when to report “a relevant incident” within one hour upon discovery.
This part will discuss on how this requirement going to affect Services Level Agreements in Singapore banking IT operations. Before this MAS notice come into effect, IT operations usually design system uptime or availability requirements according to business needs. System supporting real-time financial transactions has the higher uptime requirements. Even market data feed and AML systems which are not auxiliary to financial transactions requires high availability. Infrastructure system and monitoring services are usually regarded as secondary when availability is concerned. Failure of network monitoring system will not directly impact user or cause direct financial loss.
The MAS requirement on incident reporting within one hour upon discovery will change the importance of infrastructure system and monitoring services. Although it is possible for a bank to discover data breach or system malfunction weeks after the actual event happened, it is not what this MAS notice is designed for. The one hour upon discovery requirement is based on the bank has sound and robust monitoring infrastructure. Monitoring systems will need to run with similar availability requirement as the core financial system that requires monitoring services. Real log aggregation system like ArchSight and Splunk are important tool to discover network attacks and system malfunctions. If a bank relies on these systems to detect attacks and provide real-time intelligence, their uptime will directly impact the bank’s capability to fulfill one hour reporting upon discovery requirement. For example, when ArchSight is used to monitor 200 servers and it is down due to an error when an SQL injection attack happened. The DB server log will still record the event happened at the correct time. When the ArchSight error is fixed, it will start processing server logs and the SQL injection attack will be identified. The time discovering this attack will be much later than the server log recorded. Could the bank claim the discovery is at the later time when the ArchSight is recovered from error ? Or MAS will deem the discovery happened when DB server recorded the attack?
The actual response and judgement will need to consider specific details of each case. However, the SLA of monitoring systems will need to improve in order to show the bank is committed to meeting MAS notice.
When attending a PWC Singapore meeting on new MAS guideline, there are many questions in my head regarding how the 1 hour incident reporting requirement could be fulfilled.
The requirement requires banks operating in Singapore to report to MAS within one hour when relevant incident ( security breaches and malfunction) is discovered.
There are a few levels of complexity. One is boundary of application issue. The other is SLA issue.
Most international bank system are located in multiple time zone. Trading system maybe in London and centrally managed. Singapore application is running side by side with other regions applications. If only Japan application is under attack, shall MAS be informed taking the consideration that the affected JP application is running on same hardware platform as SG? If yes, MAS will be a central info hub of security incident globally. Also with time zone issue, international banks in Singapore will need to respond global incidents and be able to decide if the incident happening in London should be reported to MAS, not to mention the one hour requirement.
Systems are no longer running localized version. Virtualization and cost saving already change the old system to centralized and shared platforms. A clear boundary could not be easily draw when a component is affected.
I believe this question is already considered by relevant parties and MAS. One possible solution is focus on whether the remote incident materially impact Singapore operation. There should be some mutual understanding between regulator and banks on how to limited the catch all possibility of incident reporting requirement. Will talk about SLA later
Reader of the new TRM guideline from Singapore Monetary Authority will be surprised by the changes it made. It is not an simple update but a major rewrite of some of the sections. Also it incorporated key and fundamental changes in financial technology.
At the introduction section, the author set the tune for the whole document by stating that IT is no a cost center only and should be integrated with business strategies. This type of statement is advocated by vendors for a long time but I believe it is the first time a banking regulator making the same statement in a TRM guideline. From here, the reader could expect TRM function is not only about system vulnerability or malware, project risk, governance and outsourcing are also important.
Para 1.0.1 “IT is no longer a support function within a financial institution (“FI”) but a key enabler for business strategies”
The author also states user are more IT-savvy. from my experiences, the more accurate adjective would are user are getting more IT-demanding and require more features. Usability of non-financial internet and mobile applications has revolutionized by the uses of HTML5, AJAX and even 3D graphic. Users are demanding the old html only Internet banking to follow. MAS also sense these changes and urge banks to fully understand the risk before bending over backwards to please users.
Para 1.0.3 “FIs are also faced with the challenge of keeping pace with the needs and preferences of consumers who are getting more IT-savvy and switching to internet and mobile devices for financial services, given their speed, convenience and ease of use.”
Singapore Monetary Authority after one year consultation released a Technology Risk Management Guideline. It is a major overhaul of the last version which was published in 2008. For sure, the banking industry and banking technology changed a lot with the omnipresence of 3G network and mobile devices.
As part of my job is to implement TRM in FI, I will write out my comments and observations in the coming posts. But first let us take a 3000 feet view of this document. A few text analysis tool and visualization graphic will do the job.
The first graph is a word cloud which shows high frequency keywords. FI means Financial Institutions. Most of the words are general IT terms like data and systems. But should notice that “ensure” appears in a relative big size !!
The left graph show the three selected keywords: Ensure, Access and Recovery. The peaks of access and recovery show that although they are used often, this keyword mainly used in one particular chapter.