By bert hubert / bert@hubertnet.nl
This post is an excerpt of a far longer post on Galileo, its structures and
the cause of the
outage.
Here we’ll only focus on the outage – the potential underlying reasons
behind it are described in the full article.
Since the week-long outage in July I’ve been fascinated by Galileo and,
together with a wonderful crew of developers, experts and receiver
operators, have learned so much about what I now know are called ‘Global
Navigation Satellite Systems’ or GNSS. This has lead to the
galmon.eu project, which monitors the health and vital
statistics of GPS, Galileo, BeiDou and GLONASS. More about the project can
be read in the full
article.
Galmon.eu map of Galileo Service Definition instantaneous violations
Earlier I wrote a technical summary in a post called GPS, Galileo & More:
How do they work & what happened during the big
outage? which
dwells more on what the outage looked like. This article is on what went
wrong and the problems behind the outage.
The daily work of running a GNSS constellation
Running Galileo is 24⁄7 work. As explained in my earlier technical
article, for
a GNSS to function, the performance of the atomic clocks in space must be
modeled and tracked closely. Simultaneously, the precision on earth will
never be better than how precisely we know where the satellites in space
are.
Extremely simplified, a GNSS satellite is a machine that beeps exactly once
a second, while broadcasting its orbit and atomic clock to decimeter &
nanosecond precision.
Achieving such precision is hard work – if the atomic clock of a satellite
is behind by 3 nanoseconds, this can look like it being 1 meter further away
from the receiver. So which is it? Is the orbit not known precisely enough
or is the clock off? Or, heaven forbid, has the timing facility on earth
lost the plot – making everything look wrong?
Note: this is what actually happened during the week long outage, more
details below
Galileo L1 station at Redu, Belgium (Credit: Wikipedia)
Determining the interlocking behaviour of constellation orbits, space clocks
and ground clocks requires iteration & convergence – after an extended bit
of the orbit it becomes clear which is off more, the clock or the
‘ephemeris’ that describes the orbit.
To achieve the very high accuracy Galileo has promised requires uploading a
new ephemeris frequently, as often as a few times per hour.
Galmon.eu analysis of clock bias during NAPA event, no impact visible. SP3 data: ESA
Currently we often see Galileo satellites broadcast that their position is
not known precisely. When we later compare the orbit and clock performance
against what independent observers measured, we find
that the satellites were mostly just fine – it was something on the ground
that lost track of what was going on, leading the satellite to be declared
‘No Accuracy Prediction Available’ (NAPA), effectively disabling its use by receivers
(like phones and cars).
Current status
The current status of Galileo is not that great. There are 26 satellites in
space, 2 of them in botched (elliptical) orbits, 2 of them unavailable, 1
unavailable until further notice. This makes for 21 functional satellites,
which is less than the “healthy” number of 24. Anything below 24 causes
gaps in coverage, or at least, areas where the positioning accuracy will not
be good enough.
As NASA, ESA and others are fond of saying, “space is hard”, and they are
not wrong. Space is a harsh environment. GNSS vehicles consist of very
delicate components that all have to work just right, and from time to time
there are problems. This means that lately, on average, around 5% of the
Galileo satellites are unavailable because of minor problems (as described
above). Often these problems appear to be ground based and have to do with
software on earth and not with things being broken in space.
Uncorrected positioning error during a NAPA period – no change visible
It appears this software was originally procured by ESA and that any changes
have to flow from the original developer, to ESA, to the GSA and thence to
Spaceopal for deployment. This may mean that fixing software will take some
time, but I am sure they’ll eventually get there.
The week long outage
During and after the outage, the GSA and others referred to the Service
Definition
Document (SDD), which specifies the Minimum Performance Levels
(MPLs) we can expect from Galileo during the period of initial services.
As noted in the article
“Lessons to be learned from Galileo Signal Outage”
on the Inside GNSS website, this rings hollow:
Eventually, some days into the event, the GSA did post a statement on its
website, taking responsibility and even apologizing for the failure. But
then it attempted to minimize its import by arguing that, after all,
Galileo is still in its initial services phase and therefore should not
be expected to work without interruption. Among many Galileos
supporters, that message fell flat.
A major problem during the outage was in fact the communication of what was
(not) going on. As the Inside GNSS article correctly notes, no one within
Galileo was allowed to communicate anything except the European Commission
itself. And the EC is in effect in the worst position to communicate because
it is so far removed from operations.
The defense that Galileo was operating as specified in the
SDD
may in fact be correct. Within any 30 day period, there can be a full
6.9 day outage and Minimum Performance Levels can still be met.
In September an Independent Inquiry Board was
formed to study the week long
Galileo outage. The plan was to release preliminary findings in October with
final recommendations by the end of 2019.
From rumors, we know a report has been written and that it has reached some
firm conclusions. It appears a lot was wrong, but the findings are
currently EU classified.
Although October has come and gone, the independent board has not published
any initial findings so far. As an aside, an industry insider has told me
that the best way to tell how well Galileo is actually doing is to see how
late the performance reports are.
Meanwhile, earlier this week Pierre Delsaux, European Commission Deputy
Director General in charge of
Galileo,
reportedly blamed the
outage on the mistakes of a single
person.
Mr Delsaux then went on to compound this terrible statement by adding that the EC
had been transparent because the reasons for the outage had been discussed
at a conference earlier this year.
Accident details
It is indeed true that a presentation was held in Florida where details
were shared with that audience, and by paying $24 we can download the presentation that
was held
there. From
the slides, we learn that the outage stemmed from a failure in the
system that determines the satellite orbits and clock parameters, which are
normally uploaded to the satellites many times per day.
The outage in the ephemeris provisioning happened because simultaneously:

  • The backup system was not available
  • New equipment was being deployed and mishandled during an upgrade
    exercise
  • There was an anomaly in the Galileo system reference time system
  • Which was then also in a non-normal configuration

In a way this is very good news – if a major outage needed many things to go
wrong at the same time, that means it was not theoretically an accident
waiting to happen. We can of course wonder why upgrade work was happening
while the backup site was not available. It should be noted that in
general, it often happens that rarely used backup facilities are
not immediately able to take over in the face of failure.
After the incident started, it took a while to determine what was going one
before operations could be restarted, but by that time, the constellation
had already drifted too far from a known state that the status of the orbits
and clocks could be converged upon quickly. If the backup site had been
live, it would have been a great place to restart from, since it presumably
would have been in a converged state already.
As noted earlier in this page, determining the orbits and clock
configurations is a complex task, where errors in the clocks, or the
individual clock drift rates, can look just like errors in the orbit
details. Over time, by tracking orbits and clocks, such ambiguities can be
resolved, and the system then converges on a known state. From that
point on, that state can then be tracked closely.
Slide from the 2019 Galileo Update presentation at the ION Conference
After sufficient downtime however, such convergence is not a given, and it
appears the Galileo constellation had to perform a ‘cold boot’ from first
principles. The presentation at the ION conference also notes
that due to quality concerns, no shortcuts are taken in such a cold boot.
GNSS ephemeris and clock convergence is a dark art only practiced by a few
systems on the planet. Despite the fact that this should all never have
happened, I am quite impressed that it was possible to restart the system all
over again in only a few days.
Also please note that while this presentation does confirm an operator error
was involved, the actual reality is that Pierre Delsaux’s reported claim
that the incident was due to a single person making a mistake vastly
understates the many reasons for the prolonged outage.
So why did this happen?
On one hand, anyone can have a failure. However, seen another way, Galileo
as a project does have problems.
The operation of Galileo is spread out over a large number of organizations
and companies.
Partial schematic of Galileo operations & control
There is a complicated arrangement between the EU, the European Space
Agency and industry to keep Galileo going. Much of this is due to the
complicated history of the project.
It seems likely that the reliability of the project is not improved by
having such a complicated and spread out constellation of companies and
agencies – and that this may especially slow down the resolution of the
noted ground-based software problems.
To more fully understand what is going on, I heartily recommend reading the
longer version of this article, but here are some highlights:

  • The Galileo satellite constellation is underpowered to deliver the
    current performance claims (’When close isn’t enough, use
    Galileo’).
  • The official status of the project (‘initial services’) says that
    availability will be “better than 77%”, which belies the ‘better than
    GPS’ claims.
  • GPS and Galileo are engineered to be used in combination, and this
    combination is indeed superb.
  • The EU GNSS Agency (GSA) which is in charge of operating Galileo relies heavily
    on industry. The GSA itself is a hub of contracts, but is not itself a
    powerhouse of Galileo operational expertise – since all this has been
    outsourced to large defense contractors.
  • Within the Galileo project there is the Galileo Reference Centre, which
    was designed to be an independent monitor of Galileo performance. This GRC
    is in reality run by the Spanish space, IT and defense company GMV.
  • Currently, around 5% of the Galileo capacity is lost to software problems
    likely in the Orbit Synchronization Processing Facility (OSPF), run by
    GMV.
  • There is a culture of secrecy around Galileo which is not productive.
  • The EU and ESA are currently battling it out over the generation of an
    additional EU space agency. Brexit also has an impact on Galileo.
  • The Galileo operational constellation will likely only get expanded
    beyond 2020, and may in fact decrease in size before that time.

To read on, head to the full “State of
Galileo” article.