Identifying seismic signals in GNSS reference stations using machine learning.
TIM DITTMANN, UNIVERSITY OF COLORADO, BOULDER/EARTHSCOPE
JADE MORTON, UNIVERSITY OF COLORADO, BOULDER
Continuous GNSS reference stations represent stable benchmarks for unsung but critical roles in the broader infrastructure: defining reference frames and providing relative corrections, to name a few. But, what if the stable reference station is shaking? A large earthquake will release sufficient energy to permanently deform the earth and vibrate its crust and a coupled GNSS reference antenna1. Relatively weaker seismic signals at or below the perceived GNSS noise floor still can be problematic for reference products. However, these GNSS seismic ground motions identified amongst GNSS ambient noise are valuable records for seismic monitoring and research.
In this article, we provide some of our motivation with respect to the scientific utility of ground motion observations, the benefits of using GNSS as a source of these measurements, and the current role of GNSS in seismic monitoring. We then present our work selecting an optimal processing method to pair with a machine learning algorithm. This approach builds on existing stand-alone GNSS seismic awareness to enhance GNSS’ contribution to seismic hazard operations.
Informed Infrastructure Seismic Preparation
Data-driven research allows seismologists to use extensive data archives to account for the complexity of geophysical sources and signal paths of past, present and future events. Catalogs of historical earthquake ground motion data inform models of rupture and energy propagation for informed infrastructure seismic preparation. Real-time ground motion data enable a class of warning called earthquake early warning (EEW) . A successful EEW system detects the earthquake and decides its extent AFTER the earthquake rupture to then alert a population BEFORE peak ground shaking travels to a populated area. This provides those in the impacted area maybe tens of seconds warning to take life-saving actions, whether that’s to duck and cover or to stop a train or medical procedure. Finally, near real-time ground motion data informs maps of shaking intensity for targeted post-earthquake response, and are appended to the historical catalogs as the newest data point for improved preparedness.
These ground motion measurements are the lens into the earthquake system and are traditionally sourced from dedicated, long-standing inertial seismic monitoring infrastructure. The use of higher rate GNSS for seismology, or GNSS seismology , was born out of the precision achieved through the seminal engineering of GPS/GNSS and progressed over the last two decades of GNSS seismic research.
Two reasons for including GNSS as a source of seismic observations emphasized in our analysis are:
Increased spatial availability: Existing inertial and geodetic networks were largely built and continue to operate independently. Inclusion of both sensor types increases the density of ground motion observations. Such a densification is particularly valuable in relatively sparser regions , such as Alaska, but also adds redundancy and resilience to all existing overlapped networks.
Dynamic range: Inertial instruments are engineered with specific signal spectral characteristics of interest. As a result, inertial instruments are orders of magnitude more sensitive to weaker signals, including p-waves, the earliest smaller amplitude waves of earthquakes, and surface waves from events halfway around the globe. However, seismologists identified that in the nearfield of the largest events (M7.0+) that information encoded in the slower, longer period, large amplitude displacement signals is required to differentiate the magnitudes of these largest events. Traditional inertial sensors struggle to capture this information due to instrumental reasons. GNSS, with no geophysical upper bound, readily provides either direct or single integration large displacement or velocity measurements necessary for this magnitude differentiation at frequencies down to their permanent offsets.
One important distinction: In this article, we discuss GNSS seismology and present the complementary nature of inertial and geodetic sensors as stand-alone instruments, as this is currently the primary global infrastructure status quo. However, another closely related area of development and promise is seismogeodesy, or tight local integration of these sensors into a single measurement .
The USGS ShakeAlert, the operational EEW system in the United States, ingests GNSS data from the western United States geodetic reference networks through a multi-agency and university collaboration to complement inertial data ingestion. Residents of the western U.S. will benefit from this current culmination of nearly two decades of multi-national GNSS seismology research and engineering. GNSS displacements have been included in USGS post-process fault models , but not yet included in shaking intensity products or operational ground motion models.
The potential measurement range of GNSS seismology has not yet been realized operationally in part because of inherent GNSS noise characteristics. GNSS ambient position noise is predominantly the aggregate of timing effects of GNSS radio signal propagation through the atmosphere, satellite and receiver oscillators and the antenna radio frequency environment. Each are location- and time-varying influences, and distinct from the zero-baseline inertial sensor noise seismologists are most accustomed to. Current methods for discriminating signal from noise adopt variations on low-pass filters or static or dynamic thresholds from seismology. To gain the desired sensitivity to signals, high-levels of false alerts from these methods are mitigated through correlating with the inertial system as well as additional GNSS locations. This mitigation adds points of failure and latency and reduces the overall range of valid measurements included. The performance opportunity for improved seismic monitoring and EEW is to rapidly include additional, potentially spatially diverse and unsaturated, ground motion signals from GNSS sources with minimal delay by accommodating the higher dimensionality of GNSS noise.
Comparing Geodetic Processing Methods
To address the presence of seismic signals in GNSS data, we began with an evaluation of two geodetic processing algorithms . Currently, most operational systems and research approaches ingest one of the various methods of precise point positioning (PPP-AR) to make continuous estimates of antenna positions in a global reference frame. These PPP algorithms accomplish this precision at approximately sub-centimeter level using sophisticated error corrections models from multiple sources to estimate carrier phase ambiguities. GNSS seismology requires only relative topocentric motion; consequently, PPP absolute estimates are flattened to relative east, north and up components from a reference position. Time differenced carrier phase processing (TDCP) is a lightweight processing technique first applied to seismic applications by . TDCP single differences epoch-wise carrier phase measurements remove correlated error sources (e.g. troposphere). After removing the satellite velocity, a broadcast ephemeris is acceptable for this, and a least squares system of equations of all observed satellites resolves a topocentric antenna velocity vector and clock drift estimate.
We compared the relative signal to noise of PPP to TDCP to determine our processing method. For our noise estimates, we assembled a dataset of event-free 1 Hz GNSS observational data tracked by multiple receiver types, using a variety of antennas in diverse RF environments, across a hemispheric scale to account for a wide range of noise sources (Figure 3). For PPP processing, we used the UNAVCO/EarthScope PPP solutions from the Trimble RTX software . For TDCP processing, we used the open-source python package SNIVEL , which uses GPS only, broadcast ephemeris and the narrow-lane L1/L2 carrier phase combination. From the event-free processed time-series, we estimated a stochastic noise for each station-processing method pair without cleaning or filtering the data. We used these thresholds to establish a statistical noise threshold distribution across this network wide dataset to represent the ambient noise distributions.
For our reference signals, we used empirical scaling laws that relate peak dynamics, earthquake magnitude and radius from the hypocenter. These scaling laws [10, 11] are derived from existing earthquake catalogs and useful for rapid magnitude estimation; the PPP-derived peak ground displacement (PGD) scaling law is part of the current ShakeAlert geodetic contribution. We estimated a signal-to-noise (SNR) metric using PGD or peak ground velocity (PGV) derived from respective scaling laws as our signal reference related to respective ambient noise levels. This SNR metric was estimated over a range of radii from a range of earthquake magnitudes. We found TDCP is more likely than PPP to detect the intermediate magnitude earthquakes (Figure 4) and has additional benefits of being a computational light-weight, open source processing method that doesn’t require external corrections for ephemeris or error source models.
Identifying Seismic Events in GNSS Timeseries with Machine Learning
The results of our ambient processing method comparison indicated TDCP offered lightweight geodetic processing with increased sensitivity, yet still demonstrated unacceptable operational false alarm rates from our statistical threshold. The complexity of GNSS noise coupled with the variability in seismic signals encouraged us to look to an alternative detection approach. Machine learning (ML) is now an ubiquitous tool in data science. Earth scientists have leveraged algorithms developed for natural language processing or image classification and applied them to a range of challenging problems  difficult to represent in physics-based models.
We set up a data-driven ML pipeline to train, validate and test a binary classification machine learning model . The foundation of our data-driven experiment is a catalog of 1,706 5Hz TDCP velocity waveforms processed from the UNAVCO/EarthScope geodetic archive concurrent with 80 earthquakes ranging from magnitude of 4.9 to 8.2. An event-free 5Hz TDCP dataset from 30 minute windows prior to the events was included to ensure sufficient noise samples in training and testing for model generalization of these imbalanced datasets. We used 5Hz data to boost signal energy and reduce the likelihood of aliasing, and set a radius of sensitivity for each event as a function of magnitude given our previous sensitivity analysis.
Feature engineering in ML is the process of applying relevant domain knowledge to the ML model for successful classification. We evaluated several feature engineering strategies: the most effective strategy consisted of a combination of time- and lower frequency-domain features (1-30s period) extracted from overlapping 30 second windows. The three topocentric components’ features were labeled through visual inspection and concatenated into a single binary sample and label for each timestamp.
We chose a random forest classifier as our ML algorithm and adopted a nested cross validation technique in our classification training and testing (Figure 5). This validation strategy allowed us to make training/testing splits of our data on the 80 discrete earthquakes, and evaluate our model’s performance on unseen events in training. We optimized the model on a balance of sensitivity scores and false positives using its F-1 score. A traditional accuracy metric on highly imbalanced classification data, such as our earthquake catalog, is typically not descriptive of performance (e.g., for events happening <1% of the time, you can miss 100% of the events and still be >99% accurate).
The random forest classifier achieved a 90% true positive rate of the station-event pairs (Figure 6) across the entire catalog. The stand-alone classifier substantially outperformed the existing threshold and filtering (e.g. short term average over long term average, STA/LTA) detection methods as shown in Figure 7. These performance results from the classifier’s combination of time- and frequency-domain features into its decision criteria could readily improve GNSS contribution to operational seismic monitoring and ground motion catalogs. Additional investigations in deeper learning models will likely enable researchers to ask more sophisticated questions.
Finally, we tested the timing of the classifier when run once per second on the 5Hz samples of test data not used in training. We found the classifier typically had its first detection approximately at or immediately after the anticipated seismic secondary wave arrival (Figure 8). This result explains our model, or alignment of our results with our domain knowledge that explains the model’s performance. The model did not detect the weaker seismic primary wave arrivals, but instead identified the larger, lower frequency ground motions of the seismic secondary and surface waves. This result also offers implications for GNSS and inertial complimentary hazard monitoring, particularly EEW when timing and accuracy are critical.
Ground motion observations are the data currency of earthquake hazard preparation, monitoring and research. Continuous high-rate GNSS reference stations offer an alternative source that expands the dynamic range of inertial-based ground motion measurements in the nearfield of the largest, most devastating earthquakes and spatially complements existing inertial infrastructure. Complex GNSS noise signatures have bounded operational incorporation of GNSS in these hazard systems. However, alternative, lightweight processing (TDCP) paired with machine learning (random forest classifier) offers enhanced confidence in signal from noise discrimination to confidently include these ground motion measurements in operational systems with minimal false alerting and without external corrections services. The global proliferation of higher-rate GNSS reference stations to support a variety of disparate position, navigation and timing applications could all become medium to large earthquake seismometers, alerting reference station users in addition to contributing to the global seismic monitoring systems. Furthermore, embedding TDCP processing coupled with ML at high rates (>=5Hz) at the network edge will enhance the next generation of geodetic sensor networks to stream higher rate velocities for seismic monitoring or archive denser raw observables for addressing future seismic research objectives.
We would like to thank Yuinxang (Leo) Liu, Kathleen Hodgkinson, Brendan Crowell, David Mencin and Glen Mattioli. We acknowledge the open geodetic data available from the National Science Foundation GAGE facility operated by EarthScope and the open-source software used for GNSS velocity processing and their analysis, including GNSS velocity processing and machine learning libraries.
 R. Allen and D. Melgar, “Earthquake Early Warning: Advances, Scientific Challenges, and Societal Needs,” Annual Review of Earth and Planetary Sciences, 2019.
 K. Larson, “GPS seismology,” Journal of Geodesy, 2008.
 R. Grapenthin, M. West and J. Freymueller, “The Utility of GNSS for Earthquake Early Warning in Regions with Sparse Seismic Networks,” Bulletin of the Seismological Society of America, 2017.
 Goldberg, D. E., and Y. Bock (2017), Self-contained local broadband seismogeodetic early warning system: Detection and location, J. Geophys. Res. Solid Earth, 122, 3197–3220, doi:10.1002/2016JB013766.
 D. E. Goldberg, P. Koch, D. Melgar, S. Riquelme and W. L. Yeck, “Beyond the Teleseism: Introducing Regional Seismic and Geodetic Data into Routine USGS Finite‐Fault Modeling,” Seismological Society of America, 2022.
 T. Dittmann, K. Hodgkinson, J. Morton, D. Mencin and G. Mattioli, “Comparing Sensitivities of Geodetic Processing Methods for Rapid Earthquake Magnitude Estimation,” Seismological Research Letters, 2022.
 G. Colosimo, M. Crespi and A. Mazzoni, “Real‐time GPS seismology with a stand‐alone receiver: A preliminary feasibility demonstration,” Journal of Geophysical Research: Solid Earth, 2011.
 R. Leandro, H. Landau, M. Nitsschke and e. al., “RTX positioning: The next generation of cm-accurate real-time GNSS positioning,” Proceedings of the 24th international technical meeting of the satellite division of the Institute of Navigation, 2011.
 B. W. Crowell, “Near-field strong ground motions from GPS-derived velocities for 2020 Intermountain Western United States Earthquakes,” Seismological Research Letters, 2021.
 D. Melgar, B. Crowell, J. Geng, R. Allen and Y. Bock, “Earthquake magnitude calculation without saturation from the scaling of peak ground displacement,” Geophysical Research Letters, 2015.
 R. Fang, J. Zheng, J. Geng, Y. Shu and C. Shi, “Earthquake Magnitude Scaling Using Peak Ground Velocity Derived from High-Rate GNSS Observations,” Seismological Research Letters, 2020.
 K. Bergen, P. Johnson, M. V. de Hoop and G. Beroza, “Machine learning for data-driven discovery in solid Earth geoscience,” Science, 2019.
 T. Dittmann, Y. Liu, J. Morton and D. Mencin, “Supervised Machine Learning of High Rate GNSS Velocities for Earthquake Strong Motion Signals,” Journal of Geophysical Research: Solid Earth, 2022.
Tim Dittmann is a data scientist at the EarthScope consortium and doctoral candidate at the Ann and H.J. Smead Aerospace Engineering Sciences at the University of Colorado, Boulder.
Y. Jade Morton is Helen and Hubert Croft Professor and Director of the Colorado Center for Astrodynamics Research in the Ann and H. J. Smead Aerospace Engineering Sciences Department at the University of Colorado Boulder. She received a Ph.D. in Electrical Engineering (EE) from Penn State. She is a member of the U.S. Space-based PNT Advisory Board, a recipient of the AGU SPARC award, the IEEE PLANS Kershner Award, and the Institute of Navigation’s (ION) Burka, Thurlow, Kepler, and distinguished service Awards. Dr. Morton is a Fellow of ION, RIN and the IEEE.