Bringing FM and IT Together - Volume II

Bringing FM and IT together Volume II
A series of articles brought together from Quocirca s writings for SDC during 2013
January 2014
Quocirca continued to write articles for SearchDataCenter throughout 2013, looking at how facilities management (FM) and information technology (IT) professionals were needing to work together more closely than ever. This report pulls those articles together as a two-volume set for ease of access.
Clive Longbottom Quocirca Ltd Tel : +44 118 9483360 Email: Clive.Longbottom@Quocirca.com
Copyright Quocirca 2014

A series of articles brought together from Quocircas writings for SDC during 2013 Designing the data centre for tomorrow DCIM a story of growing, overlapping functionality The data centre for intellectual property management. The data centre and the remote office. Cascading data centre design
Starting a new data entre design now is likely to be different to any data centre that you may have been involved with in the past. A new way of thinking is required; one where the facility can be just as flexible as the IT equipment held within it. Data centre infrastructure management (DCIM) software has come a long way. It is now beginning to threaten existing IT systems management software in many ways and this could be where it runs into trouble. Just how far can, and should, DCIM go? It may have cost millions to build your latest data centre, and it may house some expensive equipment. However, none of this really matters to the organisation. It is the intellectual property that matters to it and the IT platform and the facility it is in must be architected to reflect this. Organisations are increasing diffuse decentralised through remote offices with mobile and home workers. Yet all of these need to be well-served by the organisations technology. How can this be best provided without breaking the bank or compromising security? Are you a server hugger? If so, probably time to review your position and your worth to the organisation. The future will be around hybrid IT with different data centre facilities playing their part. Some of these may be under your control; others will be under various levels of control from others. Disaster recovery and business continuity should not be treated as a single entity. These two distinct capabilities need their own teams working on them to ensure that the organisation gets an overall approach that meets its own risk profile and is managed within the cost parameters that the business can operate in. IT can be architected in many different ways. In some cases, a physical, one-application-per(clustered)server may still be the way forwards, whereas for others, it may be virtualisation or cloud. The hardware that underpins these different approaches may also be changing, from rackmount, self-built systems to modular converged systems. Cloud computing throws more variables into the mix just what is an organisation to do?
Disaster recovery and business continuity chalk and cheese? Managing IT converged infrastructure, private cloud or public cloud? The Tracks of my Tiers What to look for from a Tier III data centre provider
There is a concept of the tiering of data centres which can be used by an organisation to see whether an external facility will offer the levels of availability that it requires for certain IT workloads. What are these tiers, and what do they mean to an organisation? If you decide to go for an Uptime Institute Tier III data centre provider, what should you look out for? As accredited Tier III facilities are few and far between, are there other things that can be looked out for that will enable a non-Uptime Institute facility to be chosen instead?
Quocirca 2014
-2-
Designing the data centre for tomorrow

Since the dawning of the computer era in the 1960s, data centre design has essentially been evolutionary. Sure, there have been moves from mainframes to distributed computers; from water cooling to air cooling; from monolithic UPS and power systems to more modular approaches. Yet the main evolution has been from small data centres to large data centres. Even where an organisation comes to the conclusion that the cost of the next data centre is too large for itself, the move has been to a co-location facility where future growth can be allowed for. Now, the world is changing. Application rationalisation, hardware virtualisation and consolidation have led to organisations finding themselves with a large facility and a need to only house 50% or less of what they were running previously. New, high-density server, storage and network equipment, along with highly engineered systems, such as VCE VBlocks, Cisco UCSs, IBM Pure Systems and Dell Active Systems mean that less space is required for more effective direct business compute power. And then, cloud computing comes in. Suddenly, data centre managers and systems architects are no longer just having to decide how to best support a workload, but also through what means. A workload that would normally sit on a stack totally owned by the organisation may now be put into co-location, or be outsources through infrastructure, platform or software as a service (I, P or SaaS). Even where decisions are made to keep specific workloads in-house, it makes no sense to design a data centre to house that workload in the long term. Cloud is still an immature platform, but within the next few years, it is likely to become the platform of choice for the mainstream, and those organisations that have built a data centre for hosting a specific application over a long period of time could see themselves at a disadvantage. To design a data centre for the future, there are the two parts to consider the facility itself and the IT hardware that it houses. From a hardware point of view, a full IT lifecycle management (ITLM) approach can ensure that a dynamic infrastructure is maintained (reference previous article on ITLM). Use of the hardware assets can grow and shrink as the needs change, with excess equipment being sold to recoup some cost. Through the use of subscription pricing, software licenses can also be controlled, through signing up or shutting down subscriptions as required. The main issues revolve around the facility. A data centre is a pretty fixed chunk of asset if it is built to house 10,000 square feet of space and the business finds that it only needs 5,000 square feet, the walls cannot be that easily moved to serve only this area. Even where new walls can be implemented, for example to create new office space, this is only a small part of the problem solved. A data centre facility is often built with a designed and relatively fixed layout for the services offered. Power distribution units will be hard-fitted to the walls and other areas of the data centre; CRAC-based cooling systems will be fixed to the roof in specific places and UPSs and auxiliary generators will be sized and sited to suit the original data centre design. So, a new approach to the facility is the key to designing and building tomorrows data centre. The first place to start is with the physical design. If sloping sub floors and raised data centre floors are preferred to deal with any flooding issues (either natural or through the use of liquid-based fire suppressant), then make this multiple gullies, rather than a single V shaped system. This way, if downsizing is required, there will be the raised walls marking off each gulley that can be used to build new walls from without impacted the capabilities of the sub floor to allow drainage for the data centre itself.
Quocirca 2014
-3-

Next is the cabling within the data centre. This will need to be fully structured, with data and power being carried through separate paths and with an easy means of re-laying any cables should the layout of the data centre change. Then, there is power distribution itself. Rather than build these against walls or pillars, it may be better to make them free standing with power feeds coming from structured cabling from the roof. This way, should a redesign be required, the power distribution is as flexible as the rest of the IT equipment and can be easily relocated. With cooling systems, a move to free-air cooling or other low-need systems will mean that less impact will be felt in redesigning the cooling when the data centre changes size. If combined with effective hot and cold aisle approaches with ducted cooling, the cooling system can be better sized appropriately and placement is less of an issue. Even where a CRAC-based system is perceived to be needed, a move to a more modular system with multiple, balanced, variable speed CRAC units will make life easier if the data centre needs to be resized. The same goes for UPSs and auxiliary generators a monolithic system could leave an organisation looking at a need to buy a completely new unit if the needs of the data centre changes, or having a massively over-engineered system in place if they carry on using the same old UPS or generator when the data centre shrinks. As most UPS systems used these days will be in-line, every single percentage loss of efficiency could be against the rating of the UPS not against the actual power used by the equipment in the data centre. With a generator, its fuel usage will be pretty much in line with its rating, so even when running below its rated power, it will use a lot more fuel than one which is correctly engineered for the task. If your organisation is reaching a point where a new data centre is seen to be a necessity, bear in mind that the IT world is going through a bigger change than it has ever done before. Planning to embrace this change will save money in the mid- to long-term, and it will provide a far more flexible platform for the business.
DCIM a story of growing, overlapping functionality

An organisations technical environment can be seen to be of two distinct parts the main IT components of servers, storage and networking along with the facility or facilities that are required to house the IT. Historically, these two areas have fallen under the ownership and management of two different groups: IT has fallen under the IT department while the facility has fallen under facilities management (FM) group. This leads to problems as FM tend to see the data centre as just another building to be managed alongside all the other office and warehouse buildings, whereas IT tend to see the data centre as the be all and end all of their purpose in life. One groups priorities may not match with the other groups and the language that each group speaks can be subtly (or not so subtly) different. Another problem is emerging due to cloud. In the past, the general direction for a data centre has been for it to grow as the business grows; cloud can now make it that the IT equipment required within the data centre could shrink rapidly as workloads are pushed out to public cloud and yet managing this where the facilities equipment (such as UPS, CRAC units and power distribution systems) may be monolithic items. In order to ensure that everything runs optimally and supports the business in the way that is required, a single form of design, maintenance and management is required that pulls FM and IT together that also enables what-if scenarios to be run so that future planning can be carried out effectively. This has been emerging over the past few years as data centre infrastructure management (DCIM).
Quocirca 2014
-4-

DCIM systems started off as far more of a tool for the FM team as more of a part of a building information modelling (BIM) tool. BIM software enables a building to be mapped out and the major equipment to be placed within a physical representation, or schematic of the facility. DCIM made this specific to the needs of a data centre, holding information about power distribution, UPS and cooling systems, along with power cabling and environmental monitoring sensors and so on. The diagrams could be printed out for when maintenance was required, or given to the IT team so that they could then draw in the IT equipment knowing where the facilities bits were. It soon became apparent that allowing the IT equipment to be placed directly in the schematic was useful for both IT and FM. This led to a need for DCIM systems to bring in asset discovery systems alongside databases of the physical appearance and the technical description of the IT equipment so that existing data centre layouts could be more easily created. This brought DCIM systems into competition with the asset discovery and management systems that were part of the IT systems management software. Interoperability between the two systems is not always available, yet a common database, along the lines of a change management data base (CMDB) makes sense to provide a single true view of what is in a data centre. A differentiation between DCIM systems is often how good their databases of equipment are some will not be updated with new equipment details on a regular basis; others will use plate values for areas such as power usage. The difference between using a plate value (just taking the rated power usage) and the actual energy usage measured in real time can be almost an order of magnitude, which can lead to over-engineering of power, backup and cooling systems. 2D schematics have moved over in many DCIM systems to be 3D so that rack-based systems can be engineered in situ and viewed from multiple directions to make sure that pathways for humans remained traversable. 3D schematics also allow for checking to see if new equipment can be brought directly into a spot in the data centre, or if there are too many existing objects in the way. From this came the capability to deal with what if? scenarios. For example, would placing this server in this rack here cause an overload on this power distribution block? Would placing these power transformers here cause a hot spot that could not be cooled through existing systems? Again, such capabilities help both FM and IT work together to ensure that the data centre is optimally designed and gives the best support to the business. With 3D visual representations and granular details of the systems involved along with real time data from environmental sensors, the use of computational fluid dynamics then comes into play. Using empirical data from the DCIM system to see what happens to cooling flows as systems are changed and new equipment added ensures that hot spots are avoided right from the start. The problems for DCIM lay mainly in trying to give a single tool that covers two different groups. The FM team will often have their own BIM systems in place, and see the data centre as just another building with a few additional needs. To the IT team, the data centre is the centre of their universe, but they tend to see it as a load of interesting bits surrounded by a building. The need for the two teams to not only talk, but work from common data sources to create an optimal solution is not always seen as a priority. Even where DCIM is seen as being a suitable way forward, there will be a need to integrate it into existing systems so as not to replicate too much and create a whole new set of data silos. Vendors have also been part of the problem the main IT vendors have been poor on covering the facility, preferring to stick with archetypal systems management tools that just look at the IT equipment. It has been down to the vendors of the UPSs and other facilities equipment alongside smaller new -to-market vendors to come up with fullservice DCIM tools and try and create a market.
Quocirca 2014
-5-

However, those who have dipped their toes in the DCIM water seem to like it. Ticketmaster in Europe uses nlytes DCIM tools to gain the capability to easily track data centre assets through to a fine granular level, so gaining better insights into energy efficiency and individual asset utilisation to an individual customer level. The Lawrence Livermore National Laboratory (LLNL) in the US has used Romonet, a system for carrying out data centre what if? scenarios and costing to give a birds eye view of its data centre and to gain decision making support in short timescales. Other vendors in the DCIM space include Siemens, Emerson Network Power, Raritan and Cormant. CA has moved into the DCIM arena, and Digital Realty Trust (DLR) brought out its own offering, EnVision, earlier in 2013. Is DCIM for you? If you are looking at change within your data centre, growing or shrinking the amount of IT equipment in it to an appreciable level, then DCIM should be in place. If you have just carried out significant change, predict that the data centre will be in a stable state for a while or already have full asset management, systems management and BIM tooling in place, DCIM may not be for you just now. However, for a full view of exactly what is happening in the data centre and the capability to plan for the future, a full DCIM solution will be hard to beat.
The data centre for intellectual property management.

Just what is a data centre for? Think about it and then continue to read and see if your first thoughts were correct. Did you think that it was there to house IT equipment in a manner that they need? I would say that you are wrong. How about to provide an environment where a platform for running the organisations applications can be implemented and managed? Again, I dont think this hits the mark. How about a place which enables corporate intellectual property to be created and managed to the best financial benefit to the organisation? Hang on, were talking about a load of servers, storage and network equipment supported by UPSs, environmental system, cooling systems and auxiliary generators, aren t we? Sure, but if we do regard the data centre in this light, we will fail to support the business in the way it needs. No todays data centre ecosystem has to be built and managed based around the data and information it is looking after and this is leading to some pretty major differences in how the data centre should be designed and implemented. Firstly, it is unlikely that we will now be looking at a data centre. It is far more likely that we will be using a hybrid mix consisting of some or all of the following: Privately owned facility the existing data centre with owned equipment managed by dedicated staff. Co-location facility someone elses facility managed by them with the organisations IT equipment operated by the organisations staff. Hosted systems dedicated hardware platforms where the platform is managed by a third party, with the various aspects of the software stack being managed either by the service providers staff or the organisations staff. Public commercial cloud paid-for platforms ranging across the infrastructure/platform and software as a service (IaaS, PaaS and SaaS) with differing levels of control on the technical aspects of the services by the organisation.
Quocirca 2014
-6-

Public free cloud SaaS or function as a service (FaaS), such as Google or Bing Maps where a function is taken on a best efforts support basis. This mix of data centres also leads to a mix in areas where data and information will lie. No longer can an organisation simply centralise all its data into a single storage area network (SAN). On top of this is the lack of capability for the organisation to draw a line around a specific group of people and say this is the organisation. The need for organisations to work across an extended value chain of contract ors, consultants, suppliers (and their suppliers), logistics companies, customers (and often their customers) means that data and information flows are often moving into area where the organisation has less control. This is all made more complex through the impact of bring your own device (BYOD). The unstoppable tide of end users expensing their own devices and expecting them to work with the enterprises own systems, and then downloading consumer apps from appstores and so creating data and information in extra places unknown to the IT department means that the value of data and information is being increasingly diluted. IT now has to accept that the data centre itself is just part of the equation, and start to move to a model that pays far more attention to the data and information the organisation is dependent upon. To manage this, it is a waste of time looking at how firewalls should be deployed after all, just where should this wall be positioned along the extended value chain? It is equally wrong to look at applying security just at the application or hardware levels, as as soon as someone manages to breach that security, they will have free will to roam around the rest of the information held in that information store. No, data and information now has to be secured and managed at a far more granular level, with users being identified by different types through them as an individual, to their role within a team to their level of corporate security clearance. On top of this needs to be contextual knowledge, such as where the person is accessing the data from and from what sort of device. Then the data itself needs to be classified against an agreed security taxonomy which could be as simple as tagging data and information as being Public, Commercial in confidence or For your eyes only. Touchpoints need to be implemented such that the organisation can see who is attempting to access information assets this is best done through virtual private networks (VPNs) and hybrid virtual desktops, which can enforce the means in which corporate assets are accessed. Through these touch points, information security such as encryption of data at rest and on the move, along with data leak prevention (DLP) and digital rights management (DRM) can be applied alongside information rules based on access rights for the person and their context. Mobile device management (MDM) can help to keep an eye on what devices are attaching to the network, and can help to air lock them from full access to systems until appropriate identification of the individual using the device has been made. This may require multi-level identification going well beyond the normal challenge and response username/password pair, maybe to include single use access codes or biometrics. All of this then means that information assets are only accessible by the right people in the right place. Even if someone else can get hold of the digital representation of the asset, it will still be useless to them, as it will be encrypted and controlled by a DRM certificate where necessary. All of this needs changes in how the data centre operates each aspect of the above will require new systems, new applications and agreements with the business of what information security means to them. Much of this can now be done outside of the corporately owned data centre managed security providers are appearing which can provide the functions required on a subscription basis without the need for massive capital investment by your organisation. The heading to this article was The data centre for intellectual property management. As such, the title is completely wrong. What needs to be put in place is an architectural platform for intellectual property management
Quocirca 2014
-7-

and this will transcend the single facility and move far over into a hybrid mix of needs across a range of different facilities.
The data centre and the remote office.

The remote office has always been a bit of a problem when it comes to technology. The employees in these offices are still just as dependent on technology as their counterparts in the main offices but they have little to no qualifications to look after any technology that is co-located with them. Therefore, the aim has tended to be to centralise the technology and provide access to the remote employees as required. This has not tended to work well. Slow response and poor connectivity availability has pushed users away from sticking with the preferred centralised solution, instead working around the systems with processes and solutions they have chosen themselves. As bring your own device (BYOD) has become more widespread, each individual has become their own IT expert unfortunately, with a little knowledge being a dangerous thing. The choice of consumer apps to carry out enterprise tasks is leading to a new set of information silos ones that central IT has no capability of managing; ones where pulling together the disparate data for corporate analysis and reporting is impossible. Architecting a new platform that meets everyones needs should now be possible it just requires a little bit of give and take. Each individual has to accept that what pays their salary is a much greater entity the organisation. If they do not work in a manner that helps the organisation, the capability for the salary to be paid could be impacted. Therefore, working in a manner that is organisation-centric is a basic requirement of having a job and I dont care that the millennials scream that they wont work for any organisation that doesnt allow them 7 hours a day time for posting on Facebook. What IT has to look at is how best to put in place the right platform to meet the organisations and the individuals needs. This should start with a need for centralisation of the data as long as the organisation can access all data and information assets, it can analyse these and provide the findings through to those in the organisation who can then make better informed decisions against all the available information. Therefore, data and files should be stored within a single place eventually. This does not stop enterprise file sharing systems, such as Huddle, Box or Citrix ShareFile from being used; it just means that the information held within these repositories needs to be integrated into the rest of the platform. Capturing the individuals application usage is important being able to steer them in the direction of corporate equivalents of consumer applications can help minimise problems at a later date when security is found to be below the organisations needs, or the lack of the capability to centralise data leads to a poor decision being made. It may well be that remote users would be best served through server-based computing approaches such as virtual desktop infrastructure (VDI). Using modern acceleration technologies such as application streaming or Numacents Cloudpaging will provide very fast response for the remote user, while allowing them to travel between remote offices and larger offices and still have full access to their specific desktop. Citrix, Centrix Software and RES Software also provide the capabilities for these desktops to be accessible from the users BYOD devices and apply excellent levels of enterprise security to the system as well. What an organisation should be looking for is the capability to sandbox the device creating an area within any device which is completely separate to the rest of it. Through this means, any security issues with the device can be kept at bay; enterprise information can be maintained within the sandbox with no capability for the user to cut and paste from the corporate part to the consumer part of the device. Should the user leave the organisation, the sandbox can be remotely wiped without impacting the users device itself.
Quocirca 2014
-8-

For remote offices of a certain size or which are in a geographic location where connectivity may be too slow for a good end user experience, it may be that a server room may be warranted to hold specific applications that the office needs, and maybe to run desktop images for them locally. Data and information created can be replicated in the background, using WAN acceleration from the likes of Veeam, Symantec, Silver Peak or Riverbed, ensuring that it is still all available centrally. Where such a server room is put in place, it is important to ensure that it can be run lights out from a mo re central place. Depending on the person at the remote office who may have the biggest PC at home is no way to support a mission critical or even business important environment. Dedicated staff with the right qualifications must be able to log in remotely and carry out actions on the systems as required. Wherever possible, patching and updating should be automated with full capability to identify which systems may not be able to take an upgrade (for example due to a lack of disk space or an old device driver) and either remediate the issue or roll back any updates as required. Here, the likes of CA and BMC offer good software management systems built around solid configuration management databases (CMDBs). The increasing answer for many organisations, h owever, is to outsource the issue. As systems such as Microsofts Office 365 become more capable, many service providers are offering fully managed desktops that provide a full office suite, along with Lync telephony, alongside other software. Some offer the capability for organisations to install their own software on these desktops, so enabling any highly specific applications, such as old accountancy or engineering software packages to be maintained for any individuals usage. Cloud -based service providers should be able to provide greater levels of systems availability and better response times and SLAs through their scalability and should be better positioned to maintain their platforms to a more up-to-date level. With connectivity speed and availability improvements continuing, a centralised approach to remote offices should be back on the data centre managers agenda. However, the choice has to be as to how that centralisation takes place. For the majority, the use of cloud-base service provision of a suitable platform will probably be better than the use of a server room or centralisation directly to an existing owned data centre. Quocircas recommendation is to look to outsourcing wherever possible: use the existing data centre for differentiated core, mission critical activities only.
Cascading data centre design

Back in the early days of computing, a concept of time sharing was common. Few organisations could afford the cost or had the skills to build their own data centre facility, and so they shared someone elses computer in that organisations facility through the use of terminals and modems. As computing became more widespread, the use of self-owned, central data centre facilities became more the norm. The emergence of small distributed servers led to server rooms in branch offices and even to servers under an individuals desk. Control of systems suffered; departments started to buy their own computer equipment and applications. The move to a fully distributed platform was soon being pulled back together to a more centralised approach but often with a belt and braces, sticking plaster result. The end result for many was a combination of multiple different facilities, each running to different service levels with poor overall systems utilisation and a lack of overall systems availability. Virtualisation touted as the ultimate answer may just have made things worse, as the number of virtual machines (VMs) and live applications not being used have spiralled out of control. Cloud computing again, another silver bullet means that the organisation is now having to deal not only with its own issues around multiple facilities, but also other organisations.
Quocirca 2014
-9-

Increased mobility of the workforce, both through home working and the needs of the road warrior has led to a need for always on access to enterprise applications and also to a bring your own device (BYOD) appetite for using apps from other environments. Its all a bit of a mess. Just what can be done to ensure that things get better, not worse? The first thing that has to be done is a full audit of your own environment. Identify every single connected server within your network, and every single application running on them there are plenty of tools out there to do this. Once you have this audit, you will need to identify the physical location of each server. This may be slightly more difficult, but there is one way that is pretty effective where you cannot identify exactly where a server is. Deny access for it to the main network within a few minutes, there will be a call to the help desk from a user complaining: they will know where it is. Now you have a physical asset map of where the servers are, and you know what applications are running on them. First, identify all the different applications that are doing the same job. You may find that you have three or four completely different customer relationship management (CRM) systems. Make sure that you identify your strategic choice, and arrange with those using the non-strategic systems to migrate over as soon as possible. Now, identify all the different instances of the same application that are running. Consolidate these down as far as possible there may be 5 different instances of the same enterprise resource planning (ERP) application in place. Such functional redundancy is not just bad for IT in the cost of servers, operating system licences, maintenance and power that are required to keep them running, but also for the business. These systems will generally be running completely separate to each other, and this means that the business does not have a single view of the truth. Consolidation has to be carried out for everyones sake. At this point, you have a more consolidated environment, but there will still be lots of applications that are being run by the organisation that could be better sourced through a SaaS model. Software that is providing functionality that is highly dependent on domain expertise for example, payroll and expense management - is much better outsourced to a third party, as they can ensure that all the legal aspects of the process. This then leads to dealing with your organisations overall IT platform in a more controlled yet flexible manner. The overall internal IT platform for the organisation should be smaller than it was previously. Consolidation, particularly when carried out with a fully planned virtualisation strategy, should reduce the amount of IT equipment required by up to 80%. All the equipment can now be placed where you want it. But should this be all in an owned facility? Probably not. There are problems in building and managing a highly flexible data centre. Power distribution and cooling tend to be designed and implemented to meet specific needs. Further shrinkage of internal platform can lead to issues with the facilitys power utilisation effectiveness (PUE) score growing, rather than shrinking. The always -on requirement means that multiple different connections from the facility to the outside world will be required. No a cascade design of data centres is what is required. There may be applications that for any reason (long-term investment in the application and/or IT equipment, fears over data security) will be required to remain in an owned facility. There will be many more applications that can be placed into a co-location facility. Here, someone else is providing and managing the facility they have the responsibility for connectivity, cooling, power distribution and so on. You just have to manage the hardware and software in your part of the facility. Should your needs grow, the facility owner can give you more space, power and cooling. Should your needs shrink, then you can negotiate a smaller part of the facility. SaaS based solutions take this even further you have no responsibility whatsoever for the facility, hardware or software. This is all someone elses problem: you can concentrate on the business needs.
Quocirca 2014
- 10 -

Ensuring that a cascaded data centre design works, consisting of an owned facility in conjunction with a co-location facility and public cloud functionality, means having in place the right tools to manage the movement of workloads from one environment to another. It also requires effective monitoring of application performance with the capability to switch workloads around to maintain service levels. The more that is kept within an owned facility, the more availability becomes an issue, and multiple connections to the outside world will be required. However, getting it right will provide far greater flexibility at both a technical and business level. Quocirca strongly recommends that IT departments do start on this process: start carrying out a complete and effective audit now and plan as to how your IT platform will be housed and managed in the years to come.
Disaster recovery and business continuity chalk and cheese?

Most organisations will have an IT disaster recovery (DR) plan in place. However, it was probably created some time back and will, in many cases, be unfit for purpose. The problem is that DR plans have to deal with the capabilities and constraints of a given IT environment at any one time, so a DR plan created in 2005 would hardly be designed for virtualisation, cloud computing and fabric networks. The good thing is that the relentless improvements in IT have created a much better environment one where the focus should now really be away from DR to business continuity (BC). At this stage, it is probably best to come up with a basic definition of both terms so as to show how they differ. Business Continuity a plan that attempts to deal with the failure of any aspect of an IT platform in a manner that still retains some capability for the organisation to carry on working. Disaster recovery a plan that attempts to get an organisation back up and working again after the failure of any aspect of an IT platform. Hopefully, you see the major difference here BC is all about an IT platform coping with a problem: DR is all about bringing things back when the IT platform hasnt coped. Historically, the costs and complexities of putting in place technical capabilities for BC meant that only the richest organisations with the strongest needs for continuous operation could afford BC: now, it should be within the reach of most organisations; at least to a reasonable extent. Business continuity is based around the need for a high availability platform, something that was covered in an earlier article (insert link to Uptime the heart of the matter). By the correct use of N+M equipment alongside wel l architected and implemented virtualisation, cloud and mirroring, an organisation should be able to ensure that some level of BC can be put in place to provide BC for the majority of cases. Note the use of the word majority here. Creating a full BC -capable IT platform is not a low-cost project. The organisation must be fully involved in how far the BC approach goes by balancing its own risk profile against the costs involved, it can make the decision as to at what point a BC strategy becomes too expensive for the business to fund. This is where DR still comes in. Lets assume that the business has agreed that the IT platform must be able to survive the failure of any single item of equipment in the data centre itself. It has authorised the investment of funds for an N+1 architecture at the IT equipment level, and as such, the IT team has now got one more server, storage system and network path per system than is needed. However, as the data centre is based on monolithic technologies, the
Quocirca 2014
- 11 -

costs of implementing an N+1 architecture around the UPS, the cooling system and the auxiliary generation systems were deemed too high. Therefore, the DR team has to look at what will be needed should there be a failure of any of these items, as well as what happens if N+1 is not good enough. The first areas that have to be agreed with the business are around how long it will take to get to a specified level of recovery of function, and what that level of function is. These two points are known as the recovery time objective (RTO) and the recovery point objective (RPO). This is not something that an IT team should be defining the business has to be involved and must fully understand what the RTO and RPO mean. In particular, the RPO defines how much data has to be accepted as being lost and this could have a knock-on impact on how the business views its BC investment. For example, in an N+1 architecture, the failure of a single item will have no direct impact on the business, as there is still enough capacity for everything to keep running. Should a second item fail, then the best that will happen is that the speed of response to the business for the workload or workloads on that set of equipment will be slower. The worst that can happen is that the workload or workloads will fail to work. In the former case, the RPO will be to regain the full speed of response within a stated RTO which would generally be defined as the time taken for replacement equipment to be obtained, installed and fully implemented. Therefore, the DR plan may state that a certain amount of spares inventory have to be held, or that agreements with suppliers have to be in place for same-day delivery of replacements particularly for the large monolithic items such as UPSs. The plan must also then include all the steps that will be required to install and implement the new equipment and the timescales that are acceptable to ensure that the RTO is met. In the latter case where the workload has been stopped, then the RPO has to include a definition of the amount of data that could be lost over specified periods. In most cases this will be per hour or per quarter hour; in hightransaction systems, it could be per minute or per second. The impact on the RTO is therefore dependent on the business view of how many chunks of data loss it believes it can afford. The DR team has to be able to provide a fully quantified plan as to how to meet the RPO within the constraints of the business-defined RTO and if it is a physical impossibility to balance these two, then it has to go back to the business which will have to decide whether to invest in a BC strategy for this area, or to lower its expectations on the RPO so that a reasonable RTO can be agreed. In essence, BC has to be the main focus for a business: it is far more important to create and manage an IT platform in a manner for the organisation to maintain a business capability. The DR plan is essentially a safety net: it is there for when the BC plan fails. BC ensures that business continues, even if process flows (and therefore cash flows) are impacted to an extent. DR is there to try and stop a business from failing: as a workload has or workloads have been stopped, the process flows are no longer there. The two elements of BC and DR are critical to have within an organisation the key is to make sure that each compliments and feeds into and back against each other to ensure that there are no holes in the overall strategy.
Managing IT converged infrastructure, private cloud or public cloud?

The days of taking servers, storage and network components, putting them together and running applications on them on an essentially physical one-to-one basis is rapidly passing by. The uptake of virtualisation means that workloads are sharing many resources, and the emergence of as a service means that the underlying resources have to be more flexible and easy to implement and use than ever before.
Quocirca 2014
- 12 -

However, this still leaves a lot of choice to an end-user organisation. Should they go for a converged or engineered system, such as a Cisco UCS, an IBM PureSystem or a Dell VRTX, or should they go for a more cloud-like scale out model based on commodity servers? Should they go the wh ole way and forget about the physical platform itself and just go for a public cloud based service? Each has its own strengths and its own weaknesses. Converged systems take away a lot of the technical issues that organisations run up against when attempting to use a pure scale out approach. By engineering from the outset how the servers, storage and networking equipment within a system will work together, the requirements for management are simplified. However, expansion is not always easy, and in many cases may well require over-engineering through implementing another converged system alongside the existing one just to gain the desired headroom. There is also the issue of managing across multiple systems this may not be much of a problem if a homogeneous approach is taken, but if the fabric network consists of multiple different vendors, or if there are converged systems from more than one vendor in place, it may be difficult to ensure that everything is managed as expected. A private cloud environment may then be seen as a better option. Although private cloud can (and generally should) be implemented on converged systems, the majority of implementations Quocirca sees are based around the use of standard high volume (SHV) servers built into racks and rows with separate storage and network systems. Adding incremental resources is far simpler in this approach new servers, storage and network resources can be plugged in and embraced by the rest of the platform in a reasonably simple manner, provided that the management software has the capabilities contained within it. Provided that this right management software is in place, this can work. However, skills will be required that understand not only the technical aspects of how such a platform works, but also expertise in areas such as how to populate a rack or a row in such a manner so as not to cause issues through hot spots or a requirement to draw too much power through any one spot. Those choosing either of these paths must also make sure that any management software chosen does not just focus on one aspect of the platform: the virtual environment has dependencies on the physical, and these must be understood by the software. For example, the failure of a physical disk system will impact any virtual data stores that sit on that system: the management software must be able to understand this and ensure that backups or mirrors are stored on a different physical system. It must also ensure that on any failure, the aim is for business continuity, minimising any downtime and automating recovery as much as possible through the use of hot images, data mirroring and network path virtualisation across multiple physical network interface cards (NICs) and connections. Public cloud, whether this is infrastructure, platform or software as a service (I/P/SaaS) would seem to offer the means for removing all the issues around needing to manage the platform. However, ensuring that you have visibility at the technical level can help in seeing if there any trends that your provider has missed (for example, are storage resources running low, is end-user response suffering?) and is needed to help lay the what if? scenarios that organisations need to be able to run these days. In reality, the majority of organisations will end up with a hybrid mix of the above options. This brings in further issues whereas a single converged system may be pretty much capable of looking after itself, once it needs to interact with a public cloud system, extra management services will be required. Whatever platform an organisation goes for, the software really should be capable of looking at the system from endto-end. To the end user, one major issue will always be the response of a system. A converged system will report that everything is running at hyper-speed, as it tends to look inwardly and will be monitoring performance at internal connection speeds. The end user may be coming in from a hand-held device over a public network and using a mix of functions from the converged system and a public cloud: the management software must be able to monitor all of this and be able to understand what is causing any problems. It must then be able to try and remediate the problem
Quocirca 2014
- 13 -

for example, by using a less congested network, by offloading workload to a different virtual machine or by applying more storage. It must understand that by providing more network, this could mean that the higher IOPS could require a different tier of storage to be used, or for more virtual cores to be thrown at the server. All of this needs to be carried out in essentially real time or as near as makes no difference to the end user. The existing systems management vendors of CA, IBM and BMC are getting there with their propositions, with HP lagging behind. The data centre infrastructure management (DCIM) vendors, including nlyte and Emerson Network Power are making great strides in adding to existing systems management tools through including monitoring and management of the data centre facility and its equipment into the mix. EMC is making a play for the market through its software defined data centre (SDDC) strategy, but may need to be bolstered by a better understanding of the physical side as well as the virtual. One things is for sure the continued move to a mix of platforms for supporting an organisations needs will continue to drive innovation in the systems management space. For an IT or data centre manager, now is the time to ensure that what is put in place is fit for purpose and will support the organisation going forward no matter how the mix of platforms evolves.
The Tracks of my Tiers

Its time for change. Your old data centre has reached the end of the road, and you need to decide whether to build a new one or to move to a co-location partner. What should you be looking for in how the data centre is put together? Luckily, a lot of the work has already been done for you. The Uptime Institute (uptimeinstitute.com) has created a simple set of tiering for data centres that describes what should be provided in the areas of overall availability through a particular technical design of a facility. There are four tiers, with Tier I being the most simple and least available, and Tier IV being the most complex and most available. The Institute uses Roman numerals to try and avoid facility owners trying to say that they exceed one tier but arent quite the next tier and using nomenclature of, for example, Tier 3.5. However, Quocirca has seen instances of facility owners saying that they are Tier III+, so the plan hasnt quite worked. It would be fair to say that in most cases, costs also reflect the tiering Tier I should be the cheapest, with Tier IV being the most expensive. However, this is not always the case, and a well implemented, well run Tier III or IV facility could have costs that are comparable to a badly run lower Tier facility. A quick look at the tiers gives the following as basic descriptors, with each tier having to meet or exceed the capabilities of the previous tier: Tier I: Single non-redundant power distribution paths for serving IT equipment with non-redundant capacity components, leading to an availability target of 99.671%. Capacity components are items such as UPS, cooling systems, auxiliary generators and so on. Any failure of a capacity component will result in downtime, and scheduled maintenance will also require downtime. Tier II: A redundant site infrastructure with redundant capacity components, leading to an availability target of 99.741%. The failure of any capacity component can be manually managed by switching over to a redundant item with a short period of downtime, and scheduled maintenance will still require downtime. Tier III: Multiple independent distribution paths serving IT equipment; at least dual power supplies for all IT equipment; leading to an availability target of 99.982%. Planned maintenance can be carried out without downtime. However, a capacity component failure still requires manual switching to a redundant component and will result in downtime.
Quocirca 2014
- 14 -

Tier IV: All cooling equipment to be dual powered; a complete fault tolerant architecture leading to an availability target of 99.995%. Planned maintenance and the failure of a capacity component are dealt with through automated switching to redundant components. Downtime should not occur. Bear in mind that these availability targets are for the facility not necessarily for the IT equipment within there. Organisations must ensure that the architecture of the servers, storage and networking equipment, along with external network connectivity provide similar or greater levels of redundancy to ensure that the whole platform meets the business needs. The percentage facility availabilities may seem very close and very precise however, a Tier I facility will allow for the best part of 30 hours of downtime per annum, whereas a Tier IV facility will only allow for under half an hour. The majority of Tier III and IV facilities will have their own internal targets of zero unplanned downtime, however and this should be an area of discussion when talking with possible providers or when designing your own facility. It is tempting to look at the Tiers as a range of worst -to-best facilities. However, it really comes down more to the business requirements that drive the need. For example, for a sub-office using a central data centre for the majority of its critical needs, but having an on-site small server room for non-critical workloads, a Tier III data centre could be overly expensive for its needs, and a Tier I or Tier II facility could be highly cost-effective. Although Tier I and Tier II facilities are not generally suitable for mission critical workloads, if there are over-riding business reasons and the risks are fully understood and plans are in place to manage how the business continues during downtime, then Tier I could still be a solution. It is Tiers III and IV where organisations should be looking for placing their more critical workloads. Tier III facilities will still require a solid set of procedures in how to deal effectively with capacity component failures, and these plans will need to be tested on a regular basis. Even with Tier IV, there is no case for assuming that everything will always go according to plan. A simple single redundancy architecture (each capacity component being backed up by one more) can still lead to non-availability. If a single capacity component fails, the facility is now back down to a nonredundant configuration. If the failed component cannot be replaced rapidly, then a failure of the active component will result in downtime. Therefore, plans have to be in place as to whether replacement components are held in inventory, or whether there is an agreement in place with a supplier to get a replacement on site and probably installed by them within a reasonable amount of time. For a Tier IV facility, this should be measured in hours, not days. If designing your own facility, the Uptime Institutes facility Tiers give a good basis for what is required to create a suitable data centre facility with requisite levels of availability around the capacity components. It will not provide you with any reference designs areas such as raised v. solid floors, in-row v. hot/cold aisle cooling and so on are not part of the Institutes remit. If you are looking for a co-location partner, then the Institute runs a facility validation and certification process. Watch out for co-location vendors who say that there facility is Tier III or Tier IV compliant this is meaningless. If they want to use the Tier nomenclature, then they should have gone through the Institute and become certified. A full list of facilities that have been certified can be seen on the Institutes site here: http://uptimeinstitute.com/TierCertification/certMaps.php
Quocirca 2014
- 15 -
What to look for from a Tier III data centre provider

The Uptime Institute provides a set of criteria for the tiering of data centre facilities that can help when looking to use either a co-location facility or an infrastructure, platform or software as a service (I/P/SaaS) service. The idea of the tiers are to provide indications of the overall availability of the facility a Tier I facility is engineered to have no more than 28.8 hours of unplanned downtime per annum, a Tier II 22 hours, a Tier III 1.6 hours and a Tier IV 0.8 hours. As can be seen, there is a big jump from Tier II to Tier III and this is why organisations should look for a Tier II facility when looking for a new facility to house their IT within. A Tier III facility offers equipment redundancy in core areas, such that planned maintenance can be made while workloads are still on-line, and where the failure of a single item will not cause the failure of a complete area. Tier IV takes this further to provide multi-redundancy, but will only be required by those who have a need for maximum availability of the facility and the IT platform within it. For most, Tier III will be sufficient. However, there are lots of co-location, hosting and cloud vendors out there who indicate that they are Tier III (or more often, Tier 3 which the Uptime Institute do not like), many of which are not fully compliant with the guidelines. It is a case of caveat emptor buyer beware but there are certain steps that can be taken to ensure that what you are getting is fit for purpose. If you really and truly require an Uptime Institute Tier III facility, then it is really quite simple. A facility can only call itself Uptime Institute Tier III if it is certificated accordingly. The Uptime Institute provides three different types of certification and these require expense by the facility owner. The only way to become certified is to go to the Uptime Institute Professional services company and get them to audit your plans, your operational approach or your physical data centre. Just having the plans audited is a quicker way of audit, and results in a Tier Certification of Design Documents. This gives the facility owner a certificate and they can be listed on the Uptime Institutes site as a certificated member. The certification of the physical data centre can only be obtained after the data centre plans have been certificated. The Uptime Institute Professional Services company will then catty out a site visit and a full audit of the physical facility to ensure that the build is in line with the plans. If this is the case, then the facility owner will get a Tier Certificate of Constructed Facility with a plaque to go on the vendors offices or wherever, as well as listing on the Institutes site. With the Operational Sustainability Certification, an on-site visit is made to evaluate the effectiveness of components of the management and operations and building characteristics. These are compared to the specific requirements outlined in the Institutes document, Tier Standard: Operational Sustainability. Once validated, the facility owner gets a certificate, plaque and listing on the Institutes site. Therefore, the first place to start when looking for a Tier III facility is the Uptime Institutes site, as all certificate owners will be listed there. Does this mean that all of those who are not on the Institutes site should be avoided? By no means. There are those who believe that the Uptime Institute is too self-centred and that its certification process is not open enough. There are those who object to having to pay for the certification process, and others who just do not see the point of having an Uptime Institute Tiering at all. The Telecoms Industry Association (TIA) came up with a similar 4 levels of facility tiering (Tiers 1-4) in 2005, under its tiering requirements in document ANSI/TIA-942. These requirements have been modified in 2008, 2010 and 2013 to
Quocirca 2014
- 16 -

reflect changes and advancements in data centre design. The tiers roughly equate with the Uptime Institutes tiers, and as such, anyone using the TIAs system should also be looking for a Tier 3 facility. For those facilities that do not have either an Uptime Institute nor TIA tiering, then it is down to the buyer to carry out due diligence. Quocirca recommends that the buyer uses either the Uptime Institutes or the TIAs documents to pull out the areas that they believe to be of the largest concern to them and insist that the facility owner shows how they meet the needs of these. Dont let them fob you off with responses like Of course but we do it differently challenge them; get them to quantify risks and show how they will ensure defined availability targets; get them to put financial or other penalty clauses with a service level agreement (SLA) so that they become more bought in to the need to manage availability successfully. When you carry out your own site visit, ask questions wheres the second generator; what happens if that item fails; how do multiple power distribution systems come in and distribute around the facility? Only through satisfying yourself will you be able to rest easy. Taking responses at face value could work out very expensive and it is in the nature of many facility owners to promise almost anything to get higher levels of occupancy in their facility. They know that once you are in the facility, it is difficult to move out again. Certainly, the Uptime Institutes certification is the Gold Standard as it is based against a rigorous evaluation of plans, facility and operational processes against a set of solid requirements. The TIA is a more open approach which does put more of the weight of due diligence on the buyer to ensure that the requirements have been fully followed. A facility stating that it is built to Tier III standards requires yet more diligence and an understanding of the requirements. Lastly remember that these tiers only apply to the facility itself they do not define how the IT equipment itself needs to be put together to give the same or higher levels of availability. Ensuring that overall availability is high requires yet more work to cover how the IT equipment is configured
Quocirca 2014
- 17 -
REPORT NOTE: This report has been written independently by Quocirca Ltd to provide an overview of the issues facing organisations seeking to maximise the effectiveness of todays dynamic workforce. The report draws on Quocircas extensive knowledge of the technology and business arenas, and provides advice on the approach that organisations should take to create a more effective and efficient environment for future growth.
About Quocirca
Quocirca is a primary research and analysis company specialising in the business impact of information technology and communications (ITC). With world-wide, native language reach, Quocirca provides in-depth insights into the views of buyers and influencers in large, mid-sized and small organisations. Its analyst team is made up of real-world practitioners with first-hand experience of ITC delivery who continuously research and track the industry and its real usage in the markets. Through researching perceptions, Quocirca uncovers the real hurdles to technology adoption the personal and political aspects of an organisations environment and the pressures of the need for demonstrable business value in any implementation. This capability to uncover and report back on the end-user perceptions in the market enables Quocirca to provide advice on the realities of technology adoption, not the promises.
Quocirca research is always pragmatic, business orientated and conducted in the context of the bigger picture. ITC has the ability to transform businesses and the processes that drive them, but often fails to do so. Quocircas mission is to help organisations improve their success rate in process enablement through better levels of understanding and the adoption of the correct technologies at the correct time. Quocirca has a pro-active primary research programme, regularly surveying users, purchasers and resellers of ITC products and services on emerging, evolving and maturing technologies. Over time, Quocirca has built a picture of long term investment trends, providing invaluable information for the whole of the ITC community. Quocirca works with global and local providers of ITC products and services to help them deliver on the promise that ITC holds for business. Quocircas clients include Oracle, IBM, CA, O2, T -Mobile, HP, Xerox, Ricoh and Symantec, along with other large and medium sized vendors, service providers and more specialist firms. Details of Quocircas work and the services it offers can be found at http://www.quocirca.com Disclaimer: This report has been written independently by Quocirca Ltd. During the preparation of this report, Quocirca may have used a number of sources for the information and views provided. Although Quocirca has attempted wherever possible to validate the information received from each vendor, Quocirca cannot be held responsible for any errors in information received in this manner. Although Quocirca has taken what steps it can to ensure that the information provided in this report is true and reflects real market conditions, Quocirca cannot take any responsibility for the ultimate reliability of the details presented. Therefore, Quocirca expressly disclaims all warranties and claims as to the validity of the data presented here, including any and all consequential losses incurred by any organisation or individual taking any action based on such data and advice. All brand and product names are recognised and acknowledged as trademarks or service marks of their respective holders.

Bringing FM and IT Together - Volume II

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bringing FM and IT Together - Volume II

Uploaded by

Copyright:

Available Formats

Bringing FM and IT together Volume II

Copyright Quocirca 2014