Skip to main content
  • Research article
  • Open access
  • Published:

Cryogenic control system operational experience at SNS

Abstract

The helium cryogenic system at Spallation Neutron Source (SNS) provides cooling to 81 superconducting radio frequency cavities. To support the operation of the cryogenic facility, a highly reliable control system consisting of software, hardware, and Human Machine Interface (HMI) has been developed and improved during the first fifteen years of operation. Integrating the cryogenic control system with other subsystems of the SNS complex is an important aspect to the success of the operation. The operating experience, lessons learned and recommendations to consider for future facilities will be detailed in this paper.

Introduction

SNS cryogenic system

The design of the SNS cryogenic system is similar to the system deployed at Thomas Jefferson National Accelerator Facility (TJNAF) with some modifications. The SNS system is designed with about 60 % of the refrigeration capacity of the original TJNAF system [1]. Table 1 details the system specifications. Figure 1 is a simplified diagram of the system. The major components of the system include a purifier, helium gas storage, warm compressors, 4.5-K cold box, liquid helium storage, 2-K cold box, linear accelerator (LINAC) distribution system, controls system and additional ancillary systems.

Table 1 SNS cryogenic system specifications
Fig. 1
figure 1

Diagram of the CHL

SNS cryogenic control system

The Central Helium Liquefier (CHL) at SNS is a highly automated and highly reliable machine with an exceptional performance record. The control system was designed in a modular fashion within the Experimental Physics and Industrial Control System (EPICS) framework which allows it to integrate with the other controls in the SNS accelerator complex. This design EPICS includes a total of 14 Versa Module European (VME) Input Output Controllers (IOCs) and 23 Allen Bradley ControlLogix™ Programmable Logic Controllers (PLCs). Each subsystem has its own dedicated pair of ControlLogix™ PLC and a VME IOC. In this implementation, the lower level controls, equipment and instrumentation interface, and interlocks are contained in the PLC while the higher-level controls, Proportional Integral Derivative (PID) loops, diode temperature sensor modules and Linear Differential Variable Transformer (LVDT) modules are handled in the VME IOCs. A block flow diagram of the cryogenic control system is depicted in Fig. 2.

Fig. 2
figure 2

Diagram of the Cryogenic Control System

In addition to the ControlLogix PLCs and VME IOCs, the cryogenic control system utilizes EPICS “soft” IOCs to implement the cryogenic alarm handler and upper level control sequences. An EPICS “soft” IOC is a program running on a host machine performing input/output (I/O) operations on devices with no direct hardware connected to the host machine, as well as executing sequence operations, open or closed loop controls, and other computations.

The system is equipped with multiple sources of electrical power to maintain high system reliability, and consequently ensure uninterrupted control system operation. The primary power source is susceptible to interruptions caused by external factors; hence the control system devices are setup with a secondary or emergency power source delivered by an Uninterruptible Power Supply (UPS) with a diesel generator backup. Maintaining uninterrupted power to the control system through automatic transfer switches (ATS) is critical to the reliability and availability of the facility.

For operator interface, SNS has selected the Extensible Display Manager (EDM), which is maintained at ORNL. A script was created to translate the JLab operator screens, which were implemented using Motif Editor and Display Manager (MEDM), into EDM. The SNS color and font standards were applied after the screens were translated. The screens were updated to reflect the SNS plant hardware design and to incorporate improvements suggested by JLab [2]. The initial controls development effort resulted in control screens for each of the subsystems and several of the individual pieces of equipment. An example of a control screen is provided in Fig. 3. As time progressed, additional screens were included for diagnostic purposes. Screens capturing important trip data or information about equipment health were added to assist with system troubleshooting. Summary screens were included for the valves and instruments of each subsystem to aid in the calibration and initial set up of the system. The control screens closely mimic the Piping and Instrument Diagrams (P&IDs) to support operator familiarity with both the drawings and control screens. This approach aids with system operation and troubleshooting.

Fig. 3
figure 3

Main Compressor Overview Control Screen

Integration of cryogenic control system with accelerator controls

The integration of the cryogenics control system with the rest of the accelerator complex is of key importance. Data is provided from the cryogenic control system to the Radio Frequency (RF) control system to determine whether it is acceptable to apply RF power to the superconducting cavities. Additionally, the cryogenic control system is used to control the liquid level, pressure, temperature, and amount of electric heat in the cryomodules. It performs these functions in normal operating conditions and in transitional phases of operation. Several control sequences are in place to integrate these functions.

The cryogenic control system resides on its own network and is separate from the accelerator network. The cryogenics plant is equipped with its own control room and is controlled from a location separate from the rest of the accelerator. In the off shifts when the cryogenic system is unmanned, the control system is equipped with an auto-dialer that calls the staff in the event of an alarm. The accelerator central control room (CCR) has read access to the cryogenic system, monitoring operation in the cryogenic facility during off shifts. This provides redundancy to the auto-dialer. Over time, desirable control functions were identified as important for the CCR operators to perform, avoiding excess call-ins for the cryogenic staff. To enable control, the chief operator in the CCR can log in to the cryogenic network where the EPICS access security configuration has been set to provide the chief operator with limited amount of write access to the cryogenic controls. The chief operator can make select modifications such as an adjustment to the electric heat in the cryomodules to stabilize pressure.

An added challenge to integrating the controls of the cryogenics system is interfacing with the multiple vendors and contributors to the design and fabrication of the system. Most large-scale cryogenics systems are built by multiple vendors and institutes. For the SNS cryogenic system, TJNAF, Oak Ridge National Laboratory, Linde, Air Liquide, S2M, PHPK, and several other vendors were involved. SNS partnered with TJNAF personnel to lead the controls effort. However, several of these vendors provided PLC code for the operation of their component. A functional description was developed to guide the staff in coalescing these different code components into one cohesive functional control system.

Control system standards

Implementation and enforcement of standards in several areas, including software, hardware, screen design, device naming, and signal naming were recognized as early linchpins of the integration approach for the SNS controls system implementation. To ensure uniformity across all developed software, the SNS project negotiated project-wide licensing agreements. The most important aspect of control system standardization was the uniform use of the EPICS framework for all subsystem controls. EPICS provides tools for developing and executing control algorithms, a common communication protocol, Channel Access, and a set of configurable tools for graphical user interfaces (GUI). Contrary to tradition even in other EPICS laboratories, this includes both the conventional facilities and the target control systems, where integration was deemed important from the outset. Training was required for both commercial firms and partner laboratories not familiar with EPICS. The GUI tools available in EPICS include the EDM, developed for EPICS at the Oak Ridge Holifield facility and further enhanced in collaboration with the SNS controls group. EDM was chosen for easier maintenance and extensibility than competing EPICS display managers, and tools were developed to translate screens developed in two of these: MEDM and DM2K (European version of MEDM). Working with the operations team, SNS standardized layouts and color schemes used for operator screens. EDM facilitates consistent use of color rules by allowing selectable pre-defined configurations for similar types of screens. Linux was chosen as the operating system for control system development, as well as operator console, file management and high-level server applications [3].

SNS facilitated the use of hardware standards by establishing Basic Ordering Agreements (BOAs), which allowed all partners, subcontractors, and vendors to purchase selected standards at project-negotiated prices. The SNS control system makes far greater use of commercial PLCs than was traditional in EPICS-based systems. PLCs were used for subsystems that must be kept operating whether the rest of the control system is needed. SNS selected the Allen-Bradley ControlLogix™ family of PLCs for these applications. SNS originally standardized on the Motorola 2100 Power PC series of processors for its distributed IOCs, however, some limitations of this model have led to many IOCs being upgraded to Motorola 5500 s after some years of operational experience. An adapter card allows the same processor to be used for both VME and VME eXtension for Instrumentation (VXI) applications. BOAs were established for VME and VXI crates: Dawn for 7 slot VME crates; Wiener for 21 slot crates and Racal for VXI. A BOA was completed for standard, 19″ equipment racks. These were configured as required with doors, side-panels and/or other accessories.

One of the first and most important standards agreed by the partner laboratories was for signal and device naming. Despite having established the standard, the application of the naming convention broke down during the development of the control system with multiple partners. The standardized names using several different interpretations of the original standards document appeared on drawings, screens, in documents, and prototypical databases. It was also unfortunate that special characters defined in the naming standards could not be accepted by other commercial software products subsequently utilized for non-controls applications.

Results

Reliability

The SNS cryogenic system has been 99.7% reliable over the last ten years operating, on average, 5000 h per year. This equates to approximately 14 h of down time per year. That means each subcomponent of the system must greatly exceed 99.7% reliability to ensure the continued operating record of excellence. Figure 4 depicts the reliability data of the cryogenic system. The down time is calculated from the time the beam goes off to the time the beam returns. The most important aspects of recovery are response time, diagnostics and evaluation, and the implementation of the repair. When possible, these issues are corrected on scheduled maintenance days or during maintenance outages to avoid operational risk.

Fig. 4
figure 4

Reliability data of the SNS Cryogenic system for last ten years of operation

The down time experienced by the cryogenic system over the last ten years of operation can be classified into six categories: sinus filter, output module, capacitor, PLC fault, heater power supply, and JT valve motor (Fig. 5). Of these six categories, the down time caused by the sinus filter was by far the biggest contributor, responsible for over 50% of the total down time for the cryogenic system. Although this sounds like a large number, there were only four down time events in the last ten years. The SNS 2-K cold box is equipped with four cold compressors, each powered by a Variable Frequency Drive (VFD). Each VFD is equipped with a sinus filter, LC filter containing multiple inductors and capacitors. The wiring of the sinus filter in VFD3 was incorrect from the first installation of the VFD cabinet. Despite having this issue, the system operated reliably for approximately ten years. At that point, the wiring within the sinus filter burned and an inductor failed in the filter. It was realized at that point that it had been the cause of a significant portion of historical downtime. Previously, the VFD had been suspected and had been changed multiple times. After replacing the sinus filter, the system has been more stable, making it easier to execute the 2-K pump down sequence.

Fig. 5
figure 5

Downtime distribution of the SNS cryogenic system for last ten years of operation

The next largest category is output module failures. Each PLC within the control system is equipped with multiple input and output modules. The input modules detect the status of input signals such as temperature, pressure, and flow sensors whereas output modules control devices such as valves, relays, and heaters. There were only two output module failures during the last ten years, however, any failure resulting in a 2-K cold box trip usually causes at least 8 hours of down time. Therefore, it is imperative that the control system has very high reliability. One of these failures occurred on the output module for VFD for one of the cold compressors and the other was related to the helium Dewar. Both resulted in tripping the 2-K cold box.

A single capacitor failure on the power supply card in a magnetic bearing cabinet is the next most significant cause of down time in the CHL. Each of the four cold compressors within the 2-K cold box have a magnetic bearing that levitates the cold compressor wheel during operation. If the magnetic bearing fails, the cold compressor is equipped with back-up ball bearings. When the magnetic bearing fails and the cold compressor lands on the back-up bearings while spinning, this is referred to as a “hard landing”. The cold compressors are designed to withstand a small number of hard landings. In this event, the capacitor failed on one of the three phases of power. As a result, it was inconclusive as to whether the system transferred from line power to back-up battery power. In a similar installation at another institute, hard landings have resulted in the failure of a cold compressor. When this issue occurred, the emphasis was placed on the health of the system rather than minimizing down time. A “wobble” test was performed to evaluate the health of the back-up bearing. For this test, a power supply is used to tilt the cold compressor and measure the voltage readings to determine the air gap measurement between the shaft of the cold compressor and bearing. These values were compared to the original values measured several years earlier. It was determined from that measurement that the system was healthy enough to restart.

The PLC that controls the 4-K cold box had a major fault that resulted in a complete memory loss. The PLC was able to be restarted and reloaded with the program to get the system restarted. However, the root cause of the failure was not identified. In response to not identifying the problem, a spare PLC was loaded with the latest version of the code and a swap of the existing PLC was performed. This resulted in approximately two shifts of downtime. Identifying the latest revision of the code and having it easily accessible is an important consideration to minimizing downtime in a situation such as this.

Two additional smaller contributors of down time are displayed in Fig. 5. They both represent single events. The first of which was a motor failure on a Joule-Thomson (JT) valve on a cryomodule and the other was a heater power supply that failed. The motor failure was quick to repair but required the beam to be shut off to allow access to the LINAC tunnel. The power supply failure fortunately coincided with a period when SNS was not producing neutrons. This coincidence allowed recovery to be done during non-production time which resulted in minimal interruption (i.e. approximately a half hour) to neutron production.

Additional issues experienced not affecting reliability

The most important aspect of preventing down time is the awareness of the operations personnel. There are many activities that help make a system more reliable such as preventative maintenance plans, calibration programs, and system alarms. However, they cannot replace the people that walk through the plant every day, looking, listening, and smelling the operation. In 2019, an abnormal noise was heard on the main warm helium gas valve to the 4-K cold box. This valve is located outside of the CHL building. Because the control system was not instrumented to read this valve, the control screen indicated only the last commanded position which was 100% open. When the operator got a closer look to see why the abnormal noise was being caused, it was noticed that the valve was almost closed. If the valve had closed, the cold box would have tripped resulting in eight to ten hours of down time. The operations and maintenance crew formulated a plan to remove the pneumatic actuator while holding the valve open with a mechanical mechanism. After the pneumatic actuator was removed, a manual actuator was installed, holding the valve in its current position. When the manual actuator was installed, the valve was slowly opened to restore it to 100% open. Upon inspection of the valve actuator, it was determined that the seals of the positioner had failed, and it had filled with water.

Also in 2019, an abnormal noise was detected coming from one of the turbines in the 4-K cold box. The inlet valve to the turbine is controlled from the operator screens but once again, the displayed valve position did not represent the actual valve position since the valve is not equipped with instrumentation for read back. For valves with no read back, the valve position command output is used to indicate its position on the operator screens. When observing the valve in the field, it was found that the valve was oscillating from full open to almost closed. This is a major concern because it can cause turbine damage. Investigation determined there had been a failure in a pneumatic control module in the valve actuator on the turbine inlet valve. The control module was replaced. Over the next several months, multiple failures occurred with pneumatic control modules in the 4-K cold box valves leading to the conclusion the part was at end of life after approximately 15 years of service. During the next two maintenance outages, all thirty of the control modules were replaced and an adequate supply of spares is now maintained in inventory.

Another issue that arose in the 4-K cold box was a glitch in the reading of the speed sensor of a turbine [4]. In this case, the speed sensors were outputting a very low voltage signal to a tachometer causing the turbines to trip due to a loss of speed signal. An oscilloscope was installed to read both the output of the speed sensor and the output of the tachometer. It was discovered that intermittently, the tachometer output signal would drop to zero. Figure 6 shows a screen shot of the oscilloscope reading at the output of the tachometer. Initially in this reading, the output is zero before it begins to read again. To rectify this, the speed sensor was positioned closer to the target on the turbine, which resulted in the voltage signal increasing and filters were added in the PLC logic to minimize impact of a temporary signal glitch. For future installations, dual speed sensors should be considered [4].

Fig. 6
figure 6

Screenshot of speed sensor tachometer output

Another issue that surfaced over the years of operations was related to the network routers. It was originally intended that the cryogenics network would have redundant network routers. Because two routers were installed, it was assumed that they were fully redundant. During a power outage affecting one of the routers, it was clear that the switches were not fully redundant, resulting in losing control and monitoring of certain aspects of the cryogenic system. The old routers, which were approximately ten years old, were upgraded to a new model capable of supporting redundancy, dual power supplies and automatic failover capability. The old models did not support redundancy as originally thought. The cryogenic control system has two core switches that were upgraded to Cisco Catalyst 3850 switches. As shown in Fig. 7, the two switches were configured to use the Multiple Hot Standby Router Protocol (MHSRP) to provide routing redundancy. Each router is in its own HSRP group supporting redundancy for internet traffic. Router A is the active router for group 1 and serves as the standby router for group 2. Router B is the active router for group 2 and serves as the standby router for group 1. When both routers are available, they share the IP traffic load. Should either router fail, the operational router becomes the active router of the group serviced by the failed router. If the failed router returns to operational availability, preemption restores load sharing between both routers [5].

Fig. 7
figure 7

MHSRP load sharing

In the early part of 2019, the cavity heaters from cryomodules 5 through 9 tripped multiple times. Since the electric heat in the helium vessels controls the operating pressure of the LINAC, disruption of the heat can cause problems. Depending on the amount of heat lost or gained, pressure and flow abnormalities can have a negative effect on the operation of the 2-K cold box. Further inspection and observation of the AC power distribution revealed that the isobar power strips that supply AC power to the PLC and power supplies indicated a faint ‘Fault’ light. Since the isobar is also a common component of the earlier trips, the isobar power strip was replaced. After the power strip was replaced, the new one also indicated a Fault. The AC phases were checked as well as the ground. Since the isobars were fed from the ATS, it was determined that the ATS may be the cause of the problem. The old ATS was replaced with a newer model and the isobars ‘Fault’ light went away. Analysis performed on the removed ATS indicated that 6 V was present from neutral to ground. Since the problem has not resurfaced, the old versions of transfer switches were replaced with the newer models as a preventative measure [6].

Discussion

FMEA

One of the most important lessons learned for the SNS cryogenic control system was the need for a structured way of determining how to prioritize work and bring the proper attention to necessary work, securing the funding and resources to perform the work. A Failure Modes and Effects Analysis (FMEA) was performed for the entire cryogenic system in 2009. To perform such an analysis, evaluation matrices are created to evaluate and score certain events in terms of probability, severity, and detection. These three numbers are multiplied to give a risk priority number (RPN). Presumably, the higher RPNs should be prioritized over the lower RPNs. However, no system is perfect and there are times when the judgment of the people conducting the work takes precedence over the actual FMEA result. The FMEA does yield a product that defines weaknesses in the process, ranked items in need of focus, and an opportunity for a team to focus on a process, along with a driving force to produce action [4].

During the process of conducting the FMEA, it was clear that the probability evaluation matrix for the equipment was not applicable for controls. As a result, new tables were generated for controls hardware and software and the analysis was performed. For this analysis, firmware was considered software. This effort resulted in a driving force that produced funding to update PLCs and IOCs operating with firmware that had known defects. Ultimately, this effort improved the long-term reliability of the SNS cryogenic system. See Table 2 for the FMEA controls probability evaluation matrix.

Table 2 FMEA probability evaluation matrix for controls.

Calibration

The calibration effort conducted during the initial installation of the system was invaluable. The data sheets were used multiple times during start up and commissioning to verify proper system operation. Not only were they consulted in the early phases of operation, years later they are utilized to quickly determine ranges of measurement and compare current performance to the original calibration. Some difficulties were observed in conducting the calibrations. Stainless steel devices installed in stainless steel wells tended to gall and a cheater bar was required to remove these instruments. Some instruments were not designed to be calibrated with the system operating and required a plant shutdown to be maintained. Because the SNS cryogenic system has not had a sustained shutdown in the last fifteen years, many of these instruments are not calibrated routinely. As an alternative to routine calibrations, comparison screens were created to compare instruments that read similar values [7]. For example, all pressure transmitters that indicate low header pressure are put on one screen. This can be seen in Fig. 8. If one value is substantially different from the others, calibration can be prioritized, or control loops can be configured to a different transmitter.

Fig. 8
figure 8

Comparison control screen

Understanding cryogenic system operating requirements

In the design and implementation of a cryogenic control system, it is important to understand the cryogenic system operating requirements. Using a modular PLC/IOC system for each subsystem has simplified troubleshooting and is a good practice that can be utilized in future installations. Consideration needs to be given as to whether the system is to operate continuously for years or if the system will have routine shutdown periods. Including test, calibration, and validation points and signals will facilitate maintenance, and troubleshooting.

Having the control system monitor its own health is another key aspect of the design of a highly reliable and available system. It was through this monitoring that a problem was detected with the SNS purifier temperature read backs. The system was displaying values that looked reasonable however, the readings were holding their last value rather than reading accurately real time information. The system was used to detect a lack of variability in the readings and determined there was a problem. As a result, the problem was corrected, and real time readings resumed. In monitoring the system health, communication errors, module status and signal status should be evaluated and the appropriate action to take upon detection of the error should be defined. Operators must then be alerted of these off normal conditions through alarms. These characteristics of the SNS cryogenic control system have been essential.

Loss of communication and alarming

Communication between IOCs and PLCs is essential to the operation of the control system. In practice, there will be losses of communication. It is important to prepare for this event ahead of time in the design and commissioning phases of the system development. All the PLCs and IOCs must take the proper action in the event of a loss of communication. For example, if the signal from a sensor is not valid, the PLC must perform predetermined actions to mitigate this situation. If communication is lost from a particular PLC, the IOC should perform predetermined actions to mitigate that situation. These events and the corresponding actions to take can be evaluated during the FMEA process.

The auto-dialer has been a crucial piece of equipment for the SNS cryogenic control system. Since determining an automated response to every event is impossible, human intervention is a necessity in a cryogenic system. Selecting the correct alarms and values of those alarms is a very important aspect of a high reliability system. At SNS when an alarm occurs during unstaffed periods, the auto-dialer calls a Subject Matter Expert (SME). Three people are always on call to respond to such alarms. Notifying the proper people at the time of alarm provides the best chance of responding to a situation while minimizing down time.

Control screens

The display of information on a control screen can have an impact on troubleshooting. For example when displaying a valve on a control screen, multiple indicators can be displayed such as the fail state, percent open, type of value (read back or a command), the raw value of the signal it is controlling, the converted value it is controlling, and whether the valve is in automatic or manual mode. Having command values displayed without annotating the type of value has caused confusion in the operation leading operations staff to interpret a commanded position as a read back position. This can be improved by carefully displaying the information on the control screens.

It can be difficult to detect the cause of a trip for a piece of complex equipment without a trip capture program. Screens have been developed for the SNS cryogenic control system to capture the cause of a trip of components. The screens aid in the troubleshooting of the system to assist in minimizing down time. Considerations in developing these screens should be given to ensuring the screens are easily understood. Standardizing the nomenclature and color scheme of the displays will make the information more easily comprehendible. For example, a numerical zero can equal an “OK” condition and be displayed as green, while a numerical one can equal a “bad” condition and be displayed as red. The date and time of the condition should be readily displayed to assist the operator in troubleshooting. An example of one of the screens developed for this purpose is depicted in Fig. 9.

Fig. 9
figure 9

Trip capture screen for warm compressors

Redundancy

Redundancy is a critical component of any control system with multiple lessons learned having been experienced on this topic. Some critical instruments were installed with spares and some were not. It is important to review the system design to ensure spares are installed in the proper locations. Particular attention should be paid to instruments installed in high radiation environments. The temperature diode and pressure transmitter life expectancy are greatly reduced in the SNS LINAC tunnel. As a result, the control pressure transmitter for the 2 K cold box has been changed to a transmitter in the CHL just upstream of the 2-K cold box.

The network components require redundancy as part of the design to ensure continuous operation of the system. In the SNS system, redundancy is provided in the core and aggregate switches as described in section 2.2 of this paper. Redundant links are provided from these switches to the edge switches. Each spare edge switch is installed adjacent to the operating switch. If an edge switch failure occurs, the patch cable can quickly be moved physically from the failed switch to the installed spare.

Redundancy in power supply is provided to the SNS cryogenic control system. Line power is provided to the control equipment through an ATS. If power is lost, the ATS switches power to a UPS which is backed up by a diesel generator that automatically starts when power is lost. The control system remains powered even in sustained power outages, which has been an important aspect of maintaining high system availability.

Additional candidates for redundancy are the PLCs and communication to the Input/Output chassis. Maintaining a hot spare of critical PLCs would allow system updates during the operation of the equipment. This was not included in the SNS cryogenic control system. With redundancy in the communication path to the input/output chassis, the system can continue to run until a maintenance day when the problem can be addressed without effecting beam production.

Future considerations

Electrical design considerations

There has been an emphasis on electrical safety in national laboratories in the United States over the last several years. It is recommended to look for ways to incorporate electrical safety into the cryogenic controls system design. First, use Nationally Recognized Testing Laboratory (NRTL) equipment or equivalent if it is available. Utilizing equipment from a company that has been through a third-party testing and certification for product safety reduces the chances of unforeseen issues. Additionally, using low voltage sensors and power supplies makes maintenance iterations more inherently safe. It is recommended to select instrumentation with 24 VDC signal and heater power supplies and actuators under 50 VDC if possible.

Many issues and equipment failures over the years of operation could be attributed to loose wires. Spring terminals are easy to over or under torque resulting in intermittent connections which are typically very difficult to identify and resolve. As a result of this experience, many of the connections at SNS were changed to spring clamp terminals. These have been very reliable and consistent with almost no intermittent connections.

Ethernet

It is also recommended that Ethernet be used on for communication across PLC devices where possible. Issues have occurred at SNS using both DeviceNet and ControlNet for PLC communication which has complicated operations and maintenance. Using Ethernet facilitates adding new control system equipment to the system. As control systems further develop, it is preferred to push more control to the PLC and have more robust communication. Ethernet can support this development. A recent control system architecture installed for the FRIB cryogenic system has utilized Ethernet while making redundant communication paths [8]. Continuing with this strategy is encouraged.

Standardization

The successful implementation of a cryogenic control system is greatly enhanced with a proper standardization plan. This plan should include hardware, software, and naming conventions. It is suggested that a facility select a PLC vendor and stick with it even if the standard is overkill for a function. Having standard programming and available spare parts is valuable and has been useful at SNS. As part of this effort, it is recommended that standard I/O modules for each type of signal including analog input, analog output, binary input, and binary output. Similarly, there are many good instrumentation companies and if standards are not applied, a facility can be overwhelmed with many different instrument vendors. It is recommended that a facility standardize on two or three companies.

Software is much more complicated to standardize. However, a detailed specification requiring functional description documents and a guide for commenting code can save time and money in the long-term operation and maintenance of the facility. The standardization of the code is facilitated by using standard hardware. However, it is recommended in a distributed development model that the facility controls team preside over all other control efforts to integrate the entire effort.

Naming conventions should be developed and communicated to the entire collaboration and managed by the facility controls team. At SNS, this proved to be difficult and there is variability in naming convention from some of the vendors versus the partner laboratories. This can add difficulty in troubleshooting the system and maintaining consistent documentation.

Conclusion

Maintaining the reliability of a cryogenic control system requires continuous long-term effort. Much has been learned about the system at SNS in the last fifteen years of operation. The system has proven to be robust and reliable but there are opportunities for improvement. The primary components of maintaining the control system reliability at SNS are a preventative maintenance program, a FMEA, and incorporating lessons learned to continuously improve the system. As part of preventative maintenance, consideration should be given to periodically updating control system hardware, firmware and software to correct bugs and to prevent systems from becoming unsustainable due to obsolescence. This is a challenge when the upgrades require the plant to be warmed up or shut down as these opportunities are rare. When calibrations and maintenance are not possible, alternative comparisons can be utilized to determine accuracy of signals. An FMEA was completed to help prioritize efforts and provide a driving force for required funding and resource allocation. An environment of continuous improvement has been encouraged as lessons learned have continued to be applied to the system.

For future installations, consideration should be given to the lessons that have been learned at the SNS cryogenic control system. Consider all modes of operation when developing the control system. This will facilitate maintenance and calibration iterations. Redundancy and standardization are key characteristics of a control system and should be integrated into the design. Careful consideration should be given to electrical safety, communication reliability and driving more control down to the PLC level in future installations.

Availability of data and materials

The datasets are available from the corresponding author upon request.

Abbreviations

SNS:

Spallation Neutron Source

HMI:

Human Machine Interface

TJNAF:

Thomas Jefferson National Accelerator Facility

LINAC:

Linear accelerator

CHL:

Central Helium Liquefier

EPICS:

Experimental Physics and Industrial Control System

VME:

Versa Module European

IOC:

Input Output Controller

PLC:

Programmable Logic Controller

PID:

Proportional Integral Derivative

LVDT:

Linear Variable Differential Transformer

I/O:

Input/Output

UPS:

Uninterruptable Power Supply

ATS:

Automatic Transfer Switch

EDM:

Extensible Display Manager

MEDM:

Motif Editor and Display Manager

P&ID:

Piping and Instrument Diagram

RF:

Radiofrequency

CCR:

Central Control Room

GUI:

Graphic User Interfaces

BOA:

Basic Ordering Agreement

VXI:

VME eXtension for Instrumentation

VFD:

Variable Frequency Drive

JT:

Joule-Thomson

MHSRP:

Multiple Hot Standby Router Protocol

FMEA:

Failure Modes and Effects Analysis

RPN:

Risk Priority Number

SME:

Subject Matter Expert

NRTL:

Nationally Recognized Testing Laboratory

References

  1. Casagrande F, et al. Status of the cryogenic system commissioning at SNS. Particle accelerator Conf. Piscataway: IEEE; 2005. p. 970–2.

    Google Scholar 

  2. Strong H, et al. The SNS cryogenic control system: experiences in collaboration. San Jose: 8th International Conference on Accelerator and Large Experimental Physics Control Systems; 2001.

    Google Scholar 

  3. Gurd D. Management of a Large Distributed Control System Development Project. San Jose: International Conference on Accelerator and Large Experimental Physics Control Systems; 2001.

    Google Scholar 

  4. Howell M, et al. Cryogenic system operational experience at SNS, IOP Conf. Series: materials science and engineering 101; 2015. p. 012127.

    Google Scholar 

  5. Cisco Systems, Inc. Software configuration guide, Cisco IOS XE Denali 16.3.X catalyst 3850 switches. San Jose; 2018. https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3850/software/release/16-3/configuration_guide/b_163_consolidated_3850_cg.pdf. Accessed 7 Oct 2020.

  6. Harrell P. Cryomodule cavity heater trips. Oak Ridge: SNS Technical Note; 2019.

    Google Scholar 

  7. Howell M. SNS helium cryogenic plant instrument and controls experience and future considerationsPresentation at LCLS2 instrument and controls workshop; 2016.

    Google Scholar 

  8. Joseph N. FRIB cryogenic control system, IOP Conf. Series: materials science and engineering 755; 2020. p. 012093.

    Google Scholar 

Download references

Acknowledgments

This research used resources at the Spallation Neutron Source, a DOE Office of Science User Facility operated by the Oak Ridge National Laboratory. The authors would like to thank Herb Strong for his efforts while working at SNS. Much of this paper was influenced from working with him for many years.

Funding

This work was supported by SNS through UT-Battelle, LLC, under contract DE-AC05-00OR22725 for the U.S. DOE.

Author information

Authors and Affiliations

Authors

Contributions

MH wrote the manuscript and coordinated the work. SK provided the reliability data and analysis. MM provided information regarding specific controls equipment and infrastructure. KW provided control standards and historical controls information. All authors read and approved the final manuscript.

Corresponding author

Correspondence to M. Howell.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Howell, M., Kim, SH., Martinez, M. et al. Cryogenic control system operational experience at SNS. EPJ Techn Instrum 8, 4 (2021). https://doi.org/10.1140/epjti/s40485-020-00059-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjti/s40485-020-00059-y

Keywords