On March 23, 2005, an explosion erupted at BP’s Texas City refinery, which resulted in 15 fatalities, 180 injured and $3 billion in damages and legal settlements. It is one of the worst industrial disasters to date.

The explosion was caused by the overfilling of the raffinate splitter tower and a blowdown drum releasing hot hydrocarbons. The resulting vapour cloud ignited, destroying the ISOM unit.

This article describes the events and associated failures which led to this incident, and explores how effective Process Safety Management could have readily prevented the tragedy.

How the incident occurred

At 2.05am on March 23, 2005, a hydrocarbon isomerisation (ISOM) unit at the Texas City refinery was restarted after being down for maintenance. During start-up, unofficial procedures were followed as the tower was filled over the procedural guideline.

The raffinate level was solely determined using the tower’s torque-tube displacer type level transmitter, which was not calibrated for the specific gravity of the fluid(1).

The operator stopped the filling process after the transmitter indicated a level of 2.7m, when the true level was 4m. The informal procedure meant that the high-level alarm (2.3m) was ignored.

Additionally, the secondary high-level alarm (2.4m) was faulty, which was not known to the operators. The restart was stopped for the day at that point, which was very unusual.

During the day shift briefing, it was decided the restart would be stopped as the heavy raffinate product tanks were full. The day supervisor, who arrived late and missed the briefing, gave the instruction to resume the start-up.

The feed into the tower and the recirculation pumps were restarted. However, miscommunication between operators meant that the heavy raffinate outlet was left closed. As a non-continuous restart was abnormal, the day board operator had to rely heavily on the experienced supervisor.

Furnaces used to heat the fresh and recirculated feed into the tower were lit as per procedure. Shortly after this, the supervisor left the plant due to a family emergency without assigning a replacement, which was a deviation from protocol.

This left the board operator alone to oversee the restart of the ISOM and monitor other units. By this point, the raffinate level reached 20m. At 11.56am, fuel to the burners was increased further and the raffinate level reached 30m. This was 15 times the normal level, though the level transmitter indicated a safe level of 2.64m and decreasing.

It was not until 12.49pm that the level control valve was fully open, however the level kept increasing. At that point, the level in the tower reached 48m, triggering liquid flow into the vapour line.

The generated hydrostatic head activated the pressure relief valves near the base of the column, which redirected the flow into a blowdown drum. A total of 195,600 litres of flammable fluid entered the blowdown drum, quickly overfilling and discharging to the atmosphere.

The discharged fluid quickly formed a flammable vapour cloud, which was ignited by the backfiring of a nearby diesel truck. The resulting explosion killed 15 and injured 180 people, 12 of whom were in a trailer 37m from the blowdown drum(1).

A summary of the analysis of events that led up to the disaster is shown in Figure 1.

Shortcomings in Process Safety Management (PSM)

PSM is a systematic framework for managing the integrity of hazardous processes. The ultimate goal of a PSM system is to prevent unwanted release of chemicals or energy that could harm people, environment or business.

Every company operates a PSM system tailored to their operations, hazards and business strategy. Therefore, it is important to review it regularly for gaps and shortcomings, especially in the context of past disasters such as Texas City(2).

Safety leadership

Without effective safety leadership, the emphasis and resources for safety within an organisation deteriorates(3). In BP’s case, the lack of a specific person responsible for safety at the executive level meant that there was inadequate oversight and resources to develop an effective PSM system.

In addition, the impact of strategic decisions on process safety was not considered at an executive level(4). Consequently, there was insufficient staff, inadequate training and critical safety upgrades were ignored.

Ultimately this created enough critical vulnerabilities in the Texas City PSM system for the disaster to occur. It is argued that this is one of the major root causes of the accident(1).

A lack of safety leadership meant the company focused primarily on financial performance, leading to reductions in staffing, training and safety control systems.

Instituting effective safety leadership at an executive level creates a safety mindset throughout the organisation and would have removed the 'tick-the-box' mentality which existed at the Texas City refinery(1).

PSM requires continuous commitment from the entire organisation but it must start from the top. Senior leaders must maintain responsibility and accountability for safety for the company as a whole(5).

Without management endorsement and continued engagement, as well as the availability of adequate resources, the adoption of PSM in an organisation is either difficult or unsustainable(3).

Process safety metrics

Safety metrics are vital in assessing the overall safety performance of a plant, provided the correct metrics are monitored. At Texas City, only personal safety was measured using lagging indicators, such as lost time incident rate, while process safety metrics were not measured.

Management believed that the plant was safe as personal safety metrics improved, despite having three fatalities from three major accidents, of which two were process safety related(6).

Both leading and lagging indicators should be used to effectively measure the health of a plant with respect to PSM. Lagging indicators, such as the monthly loss of containment incidents, use historical data to highlight areas for improvement(7).

Leading indicators use routine checks or audits to improve safety performance, therefore creating an early barrier against critical failures. A leading indicator that could have been useful in Texas City is the measure of the length of time equipment was left in a failed state as a percentage of the plant’s uptime(8).

Had both been used, they could have exposed the latent failures at the refinery, such as the faulty alarms. Such indicators point to critical safety issues and can be used to drive and direct proactive, remedial actions(9).

Mechanical integrity

Mechanical integrity refers to the management of critical process equipment and instrumentation such that it is correctly designed, installed, operated and maintained(10). There were multiple mechanical failures at the Texas City refinery(1):

  • The raffinate splitter tower level indicator was incorrectly calibrated and did not display the true raffinate level;
  • The redundant high-level alarm of the raffinate splitter tower level was non-functional and did not sound;
  • The level sight glass was dirty and therefore prevented manual verification of the raffinate level;
  • The manual vent valve that allowed operators to vent vapour from the raffinate splitter tower was non-functional during start-up testing;
  • The high-level alarm on the blowdown drum was non-functional and did not sound.

A good mechanical integrity programme will ensure that the process equipment and its instrumentation meet all safety requirements throughout their operational lifetime. It also requires that all relevant employees have sufficient training and tools to maintain the equipment and that the equipment is inspected regularly(11).

Learning culture

An effective PSM requires the investigation of each incident that resulted in, or could reasonably have resulted in, a catastrophic release of a highly hazardous chemical(11).

Prior to the disaster, both Texas City refinery and the wider organisation lacked a learning culture. Accidents, such as a previous flammable release from the blowdown, were not adequately investigated. In addition, findings from past reports were often not acted upon.

If a learning culture was in place, repeated common issues would have been resolved and the plant better prepared(1).

Learning means identifying the root causes of incidents, communicating lessons and implementing new control measures. Additionally, similar root causes can be present on other sites, so it is important to share the learnings company-wide.

In this instance, lessons from incidents at BP Grangemouth, which highlighted similar deficiencies in process safety metrics and communication, could have been shared and the tragedy avoided(1).

Prior to the accident, some investigations were stopped at 'operator error' as the root cause(12). However, operator (human) error is not a true root cause – human error can be divided into three broad categories: skill-based errors, mistakes and violations(13). It is possible to create safeguards to reduce the risk of human error, for example process automation.

Operating procedures

Operating procedures are developed alongside the process and describe safe methods of operation, which take into consideration different operational, regulatory and safety requirements. They are essential to minimise errors, standardise operations and protect against unsafe operations(14).

Poor operational discipline was commonplace and informal procedures were frequently used, a problem never addressed by the management. Operators relied on knowledge of past start-up experiences and developed informal work practices.

In most ISOM startups, the tower was filled above the range of the level transmitter, against official procedures. In addition, the level control was often left in manual mode instead of automatic. The first deviation effectively bypassed the level indicator at the start-up, while the second introduced the risk of human error to the level control.

Operating procedures must reflect current plant practice and involve operators in their development, otherwise there is a risk of introducing new process hazards. Instead of using informal procedures, a formal request for change must be raised. This should include an analysis of why the current procedure is inadequate, a risk assessment and integration of the changes into the process and procedure.

Management of change (MOC)

Management of change (MOC) is a key element of PSM. Chemical plants are dynamic and changes to processes, management and organisational structures happen regularly. Even seemingly small changes can introduce new hazards or disrupt the control of existing ones.

At Texas City refinery, an MOC was required for all changes except those of organisational nature, such as budget cuts. This introduced new hazards down the line, such as cutting back on the mechanical integrity programme.

Furthermore, 20% of actions were overdue with several key changes made before or even without the final MOC approval, such as the placement of the contractor trailers(1). A good MOC system must include(15):

  • Anything that is not a like-for-like replacement must be assessed;
  • MOCs must be regularly reviewed to ensure that MOCs are assessed and actions progressed;
  • Leading and lagging KPIs must be used to assess MOC effectiveness;
  • Temporary changes must be included and should be periodically reviewed and reassessed for continuation;
  • Changes must be reviewed, peer-reviewed and authorised before being implemented.

It is important to remember that any process is only as strong as the commitment of those involved. All changes must follow the approved MOC procedure and completely adhere to it.

Pre-Startup Safety Review (PSSR)

PSSR is a systematic and thorough check of a process prior to the introduction of a highly hazardous chemical to a process. Although BP required all start-ups to go through a PSSR, none was completed and non-essential personnel remained in the hazardous area(1).

PSSR is a necessary action before starting operation to ensure that all hazards are identified and managed. It should cover all aspects of operation and must confirm(7):

  • Construction and equipment are in accordance with design specifications, such as instrument calibration and alarms functionality;
  • Safety, operation, maintenance, and emergency procedures are in place and are adequate;
  • All operators involved with the process are adequately trained;
  • Relevant MOC requirements have been met.

Conclusion

The analysis of the disaster highlights several long-term PSM deficiencies that led to devastating safety failures. The deficiencies stem from the lack of commitment from senior leaders to safety, which was identified as a key root cause of the accident.

Even though many of the shortcomings were identified previously, no significant improvements to PSM were made, allowing the disaster to claim 15 lives and injure 180 people.

Authors: Kalokson Gurung, Laya Jayadeep, Janusz Siwek, Satyam Vora and David Zhou, University of Bradford, UK.

References

1.) F. Lees, Lees’ Loss Prevention in the Process Industries: Hazard Identification, Assessment and Control, Oxford: Elsevier Science & Technology, 2012.
2.) U.S. CSB, 'Investigation Report - Refinery Explosion and Fire', U.S. CSB, 2007.
3.) IChemE Safety Centre, 'Process Safety and the ISC', 2014. [Online]. Available: https://www.icheme.org/media. [Accessed 20 April 2020].
4.) A. Hale, 'Why safety performance indicators?', Safety Science, vol. 4, no. 4, pp. 479-480, 2009.
5.) American Petroleum Institute, Mechanical Integrity: Fixed Equipment Standards & Recommended Practices, Washington: American Petroleum Institute, 2018.
6.) U.S. OSHA, Process Safety Management, U.S. OSHA, 2000.
7.) The Telos Group, 'BP Texas City Report of Findings', The Daily News, 2005.
8.) HSE, HSG254 - Developing Process Safety Indicators: A StepBy-Step Guide For Chemical And Major Hazard Industries, HSE Books, 2006.
9.) HSE, HSG245 - Investigating Accidents and Incidents, HSE Books, 2004.
10.) K. Patterson and G. Wigham, 'Management of Change – what does a ‘good’ system look like?', Loss Prevention Bulletin, no. 267, p. 7, 2019.
11.) The BP U.S. Refineries Independent Safety Review Panel, 'The Report of The BP US Refineries Independent Safety Review Panel', 2007.
12.) HSE, L111 - The Control of Major Accident Hazards Regulations 2015, HSE Books, 2015.
13.) A. Hopkins, Failure to Learn: the BP Texas City refinery Disaster, Sidney: CCH Australia, 2008.
14.) OECD Environment, Health and Safety, 'Corporate Governance for Process Safety - Guidance for Senior Leaders in High Hazard Industries', OECD, 2012.
15.) CCPS, 'Process Safety Leading and Lagging Metrics - You Don’t Know What You Don’t Measure', CCPS, 2011.

The authors, a team of engineering students at the University of Bradford in the UK, were awarded the 2020 SIESO Medal by the Institution of Chemical Engineers (IChemE) for their innovative pop-up book depicting the lessons learned in the 2005 Texas City refinery disaster. This article was featured in the latest issue of the Loss Prevention Bulletin published on October 5 and is free to download on IChemE’s website. The SIESO Medal is an annual competition open to all students around the world, and the medal and £750 are awarded for the most novel and innovative multimedia presentation of a major incident and the learning outcomes. This is the second year the competition has been run and all entries are reviewed by chemical engineers of IChemE’s Loss Prevention Bulletin Editorial Panel.

The Institution of Chemical Engineers (IChemE) advances chemical engineering's contribution worldwide for the benefit of society. It supports the development of chemical engineering professionals and provides connections to a powerful network of about 35,000 members in 100 countries. More information: www.icheme.org