Narrative:

ZNY oceanic system (atop) experienced a dual fpap (flight plan approval package) processor failure resulting in a dual channel failure operation. The sectors were split out which is our typical operation for a saturday morning. Traffic volume was moderate when the outage began. I was assigned to the atop (advanced technologies and oceanic procedures) refresher class for the day and an announcement was made for all oceanic controllers to return to the areas immediately.when I returned to the area there were 15 controllers in the area plus some management and tech operator's presence running around in different directions. My personal observations were close to 10 people huddled around one sector controller and the rest were working other sectors. We all were asking how to help and no one seemed to have a clue about what to do next at this time. I inquired if we had begun the atop failure checklist and no one had done anything regarding it. I then observed that they had taken out a few of the strip bay holders and one toolbox; but no bay headers; checklists; or other materials were posted. I inquired about the strips and was told that one set was incomplete; and the other printer was misaligned which cutoff the entire top line; making them useless. One combined sector had a set of strips that was good enough to clear the airspace and that controller had already posted them and was coordinating. Another sector was trying to hand write and figure out strips to clear the airspace. The controller in charge (controller in charge) and controllers had already shut down our outer boundaries and spun or turned away any aircraft coming into the airspace. This also included the adjacent facility airspace which was holding approximately 12 aircraft within radar coverage. The two controllers working continued working to clear the airspace which they had; but we also had a push of aircraft tracking westbound from the north atlantic which was the next issue to deal with. We had no strips at the time so we could only work what we knew about. The system asd's (aircraft situational display) still showed data blocks for anyone that was within the system (active or pre-active) and the data blocks moved. We were unable to utilize the probe; clearance; or coordination at all which rendered the channels both unusable. The controllers first relied on these displays to help orientate and identify the flights and ensure we had strips or information on them.the two controllers seemed to have a handle on clearing their traffic at this point but now the conversation moved to how to handle the sector holding aircraft. Immediately multiple controllers said that we needed to force them to land at a diversion airport since we are ATC zero. Their belief was that we were ATC zero and unable to provide any oceanic services; hence they were unable to allow them back into the non-radar airspace. However we were told they could only accept 4 more flights there; we had about 12. At that point we then discussed taking traffic at low altitude since we knew there was no traffic below 31000 ft. An argument began between controllers and once again the ATC zero argument continued. I became quite vocal and stated that we cannot just look the other way if they couldn't land at the only usable airport; they must continue in some capacity. The controllers then argued back again saying we will not clear anyone because that's not our decision. The problem at this point was you had too many 'cooks in the kitchen' and there was no direction from above. Our area was operating with a controller in charge and the OM (operation manager) on duty had zero oceanic experience/knowledge; so they basically were just asking us what we needed or wanted to do and going along with it. My area continued to stand their ground saying they shouldn't accept any flights without approval; so I went up to the OM on duty and explained the situation. He immediately agreed and went to the area supervisor to facilitate him coordinating the plan to get the aircraft out of the adjacent facility radar airspace. Each aircraft was solicited on the capacity to continue at 30000 ft. Or below until the sju cerap boundary. A few aircraft elected to continue and a few diverted. The aircraft were all cleared on the normal routings with altitude separation. These flights all made it through without any issue. After this issue was resolved we continued working on the long term plan. We had data from the crash point; so we started hand writing strips for sectors and for the additional traffic coming. This was very time consuming and as we started to do this; the tech operations staff advised they were restarting a channel. It crashed two more times after immediately and then they got one started up finally. Much of the original data was still there and just not up to date. We worked to quickly update the information on the few flights remaining and checked the accuracy of the flights. The system stayed up at that point and we returned to normal operations shortly after.there were multiple problems with how this failure was handled which I have been told or seen happen with past failures. The first big issue is we never get a complete set of strips like we are supposed to. The strips are supposed to be generated in both areas and one set is the 'working' set and the other one is held by the supervisor or controller in charge for reference or to find missing strips. The one set we had was incomplete and the other one was misaligned and unusable. We need to fix this issue and figure out why the strip printing never seems to work properly. As well; the printers should be checked regularly for alignment and functionality.when we recalled all of the area personnel; it became organized mayhem as you had 15 different controllers all arguing or debating how to proceed. We needed to have established procedures or escape routes for us to clear the airspace during an outage. We cannot sit there and debate the procedure every time. The other issue was we also need to have an established and set person in charge during this. We had multiple controller in charge's in the area and multiple controllers who all acted like they were in charge. The actual controller in charge of record at the time did not get super involved and basically let the other controllers figure it out. The controller in charge in our case has to take the management role and make decisions; that's their function along with the OM and area F supervisor. I think it's also important to minimize the bodies herding around the controllers working. Keep people nearby but let the controllers working and their management/controller in charge make decisions and not be distracted. When the atop system fails; the contingency plan should have an immediate set of actions to take and a long term contingency. According to others in my area; we only have a long term plan which can be used after the entire airspace has been cleared. This can take 5 or 6 hours before we could even start to consider to use. The toolboxes and enclosed materials were never fully utilized. They only took one box out and it was barely touched. No one talked to arinc during the outage. Arinc provides our position reporting; request; and miscellaneous communications to non cpdlc aircraft. During an outage; all aircraft must utilize HF since we have no way to get cpdlc messages. Each area should delegate additional personnel to work with arinc and copy position reports; requests; etc. To be passed to the controllers. In our current setup; the only way I can see to do this would be to phone patch a controller onto each frequency to listen and copy information. Depending on arinc's setup; you might need more than one person to do this. Then this information can be passed along and in the event there are any emergency or other information; we know about it. During this outage; we wouldn't have known anything unless arinc called over saying it was an emergency.

Google
 

Original NASA ASRS Text

Title: Oceanic Controller reported an ATOPS failure caused aircraft to hold and divert and ATC confusion due to lack of supervision; procedures not being followed and a malfunctioning strip printer.

Narrative: ZNY Oceanic System (ATOP) experienced a dual FPAP (Flight Plan Approval Package) processor failure resulting in a dual channel failure operation. The sectors were split out which is our typical operation for a Saturday morning. Traffic volume was moderate when the outage began. I was assigned to the ATOP (Advanced Technologies and Oceanic Procedures) refresher class for the day and an announcement was made for all Oceanic Controllers to return to the areas immediately.When I returned to the area there were 15 Controllers in the area plus some management and tech operator's presence running around in different directions. My personal observations were close to 10 people huddled around one Sector Controller and the rest were working other sectors. We all were asking how to help and no one seemed to have a clue about what to do next at this time. I inquired if we had begun the ATOP failure checklist and no one had done anything regarding it. I then observed that they had taken out a few of the strip bay holders and one toolbox; but no bay headers; checklists; or other materials were posted. I inquired about the strips and was told that one set was incomplete; and the other printer was misaligned which cutoff the entire top line; making them useless. One combined sector had a set of strips that was good enough to clear the airspace and that Controller had already posted them and was coordinating. Another sector was trying to hand write and figure out strips to clear the airspace. The CIC (Controller in Charge) and Controllers had already shut down our outer boundaries and spun or turned away any aircraft coming into the airspace. This also included the adjacent facility airspace which was holding approximately 12 aircraft within radar coverage. The two Controllers working continued working to clear the airspace which they had; but we also had a push of aircraft tracking westbound from the North Atlantic which was the next issue to deal with. We had no strips at the time so we could only work what we knew about. The system ASD's (Aircraft Situational Display) still showed data blocks for anyone that was within the system (Active or Pre-active) and the data blocks moved. We were unable to utilize the probe; clearance; or coordination at all which rendered the channels both unusable. The Controllers first relied on these displays to help orientate and identify the flights and ensure we had strips or information on them.The two Controllers seemed to have a handle on clearing their traffic at this point but now the conversation moved to how to handle the sector holding aircraft. Immediately multiple Controllers said that we needed to force them to land at a diversion airport since we are ATC zero. Their belief was that we were ATC zero and unable to provide any oceanic services; hence they were unable to allow them back into the non-radar airspace. However we were told they could only accept 4 more flights there; we had about 12. At that point we then discussed taking traffic at low altitude since we knew there was no traffic below 31000 ft. An argument began between Controllers and once again the ATC zero argument continued. I became quite vocal and stated that we cannot just look the other way if they couldn't land at the only usable airport; they must continue in some capacity. The Controllers then argued back again saying we will not clear anyone because that's not our decision. The problem at this point was you had too many 'cooks in the kitchen' and there was no direction from above. Our area was operating with a CIC and the OM (Operation Manager) on duty had zero oceanic experience/knowledge; so they basically were just asking us what we needed or wanted to do and going along with it. My area continued to stand their ground saying they shouldn't accept any flights without approval; so I went up to the OM on duty and explained the situation. He immediately agreed and went to the Area Supervisor to facilitate him coordinating the plan to get the aircraft out of the adjacent facility Radar airspace. Each aircraft was solicited on the capacity to continue at 30000 ft. or below until the SJU CERAP boundary. A few aircraft elected to continue and a few diverted. The aircraft were all cleared on the normal routings with altitude separation. These flights all made it through without any issue. After this issue was resolved we continued working on the long term plan. We had data from the crash point; so we started hand writing strips for sectors and for the additional traffic coming. This was very time consuming and as we started to do this; the Tech Operations staff advised they were restarting a channel. It crashed two more times after immediately and then they got one started up finally. Much of the original data was still there and just not up to date. We worked to quickly update the information on the few flights remaining and checked the accuracy of the flights. The system stayed up at that point and we returned to normal operations shortly after.There were multiple problems with how this failure was handled which I have been told or seen happen with past failures. The first big issue is we never get a complete set of strips like we are supposed to. The strips are supposed to be generated in both areas and one set is the 'working' set and the other one is held by the supervisor or CIC for reference or to find missing strips. The one set we had was incomplete and the other one was misaligned and unusable. We need to fix this issue and figure out why the strip printing never seems to work properly. As well; the printers should be checked regularly for alignment and functionality.When we recalled all of the area personnel; it became organized mayhem as you had 15 different Controllers all arguing or debating how to proceed. We needed to have established procedures or escape routes for us to clear the airspace during an outage. We cannot sit there and debate the procedure every time. The other issue was we also need to have an established and set person in charge during this. We had multiple CIC's in the area and multiple Controllers who all acted like they were in charge. The actual CIC of record at the time did not get super involved and basically let the other Controllers figure it out. The CIC in our case has to take the management role and make decisions; that's their function along with the OM and Area F Supervisor. I think it's also important to minimize the bodies herding around the Controllers working. Keep people nearby but let the Controllers working and their management/CIC make decisions and not be distracted. When the ATOP system fails; the contingency plan should have an immediate set of actions to take and a long term contingency. According to others in my area; we only have a long term plan which can be used after the entire airspace has been cleared. This can take 5 or 6 hours before we could even start to consider to use. The toolboxes and enclosed materials were never fully utilized. They only took one box out and it was barely touched. No one talked to ARINC during the outage. ARINC provides our position reporting; request; and miscellaneous communications to non CPDLC aircraft. During an outage; all aircraft must utilize HF since we have no way to get CPDLC messages. Each area should delegate additional personnel to work with ARINC and copy position reports; requests; etc. to be passed to the Controllers. In our current setup; the only way I can see to do this would be to phone patch a Controller onto each frequency to listen and copy information. Depending on ARINC's setup; you might need more than one person to do this. Then this information can be passed along and in the event there are any emergency or other information; we know about it. During this outage; we wouldn't have known anything unless ARINC called over saying it was an emergency.

Data retrieved from NASA's ASRS site and automatically converted to unabbreviated mixed upper/lowercase text. This report is for informational purposes with no guarantee of accuracy. See NASA's ASRS site for official report.