Technical Article • September/October 2015
GPS ConfidentialEnabling Proximity Detection While Preserving Location PrivacyWith the proliferation of GPSenabled mobile devices, location based services have gained in popularity. In particular, proximitybased services are commonly used to help a user interact with other users nearby. However, these services endanger the privacy of users as the location information is provided to the service provider. This article proposes a novel private proximity detection scheme based on partial GPS measurement information. We develop an efficient algorithm for proximity detection and theoretically analyze the false alarm performance. Empirical results validate this scheme and evaluate its performance.
Share via: Slashdot Technorati Twitter Facebook Ubiquitous locationaware mobile devices, mainly GPSenabled smartphones, have led to a boom in locationbased services (LBS), which have been revolutionizing businesses and lifestyles. Common uses of LBSs include asset tracking, locationbased advertising, emergency roadside service, turnbyturn navigation, and realtime traffic & road information sharing. A major category in LBSs is proximitybased services, which allow users to search for friends or other points of interest around them. Examples of proximitybased mobile social networking include Apple’s “Find My Friends,” Facebook’s “Nearby Friends,” Tencent’s “WeChat,” Momo, and Nearby. In Find My Friends, a user can see the locations of his friends and get notified when his friends are nearby. In WeChat, a user can find nearby users by the following two ways:
To use an LBS, a user usually has to send his exact location to the service provider, sporadically or frequently. The contextual information attached to user locations may, however, also reveal the users’ habits, interests, activities, health status, and political and religious affiliations. The high level of intrusion and privacy threats associated has made many users reluctant to opt into LBSs. So, designing practical and effective privacypreserving proximitydetection schemes would reassure users who have concerns about maintaining their privacy. Past work on location privacy has explored quite a few approaches, such as anonymization, obfuscation, and adding dummies. A common concept underlying most of these approaches is to “degrade information in a controlled way before releasing it,” as summarized in the paper by B. Hoh and M. Gruteser listed in the Additional Resources section near the end of this article. This article proposes a new approach to preserving location privacy in proximitybased services. Our approach makes use of the location information inside a GPS receiver. The key difference from previous work is that rather than obtaining accurate location information and then degrading it, we extract privacypreserving location information directly from an intermediate step in GPS location estimation. Our private proximity detection scheme presented in this article is designed for locationbased “friending” applications in a global social network. A user shares his untagged range measurement, which is derived from the user’s GPS range measurements, with the server. The server can efficiently detect if any two users are within a threshold distance of each other. However, it is computationally intensive for the server to infer each user’s exact location from the untagged range measurement. Our approach can be used independently or together with other approaches, such as obfuscation, to provide a higher level of privacy protection. This article describes how untagged range measurements preserve location privacy and a very efficient matching algorithm for proximity detection. It also evaluates the proximity detection performance through a theoretical analysis and field experiments. The evaluation results demonstrate the efficacy and robustness of our scheme.
Previous Work in the Field A possible approach is privacypreserving test described in the article by A. Narayanan et alia, which is based on the location tag initially studied by the publications by D. Qiu et alia listed in Additional Resources. With proper location tags, location proximity can be reduced to measuring the similarity between two sets of tags. Narayanan et alia suggested deriving tags from surrounding environment including WiFi traffic and access point identifiers, GSM signals, and GPS signals. The untagged range measurement proposed in this article is closely related to the location tag. A major difference is that the location tag still requires certain types of cryptography to work. The ElGamal encryption suggested by A. Narayan et alia requires much more computational resources on the user end and server side than does our method proposed in this article.
Private Proximity Testing Using Untagged GPS Range Measurements r_{i} = [r_{i}^{(1)}, r_{i}^{(2)}, . . . , r_{i}^{(Ki)}]^{T} (2) where K_{i} is the number of satellites visible to this user. The range measurement made to the satellite k, r_{i}^{(k)} does not include the receiver clock bias, for all k = 1, . . . , K_{i}, as the receiver clock bias can be easily calculated and removed beforehand. Furthermore, we require r_{i} to be a sorted vector in ascending order, i.e., r_{i}^{(1)} ≤ r_{i}^{(2)} ≤ . . . ≤ r_{i}^{(Ki)}. Location Privacy Protection. When the range measurements are not designated with PRN numbers, if the adversary has no knowledge of satellite orbits, the K range measurements can be seen as an ordered selection from the L satellites in the whole constellation. Therefore, the search space is Kpermutations of L.
Usually, we have L ≈ 30 and K ≈ 10, and thus the size of search space is L!/(L – K)! ≈ 1.1 x 10^{14}. The untagged range measurements can also confuse an adversary. First, untagged range measurements seen at two or more distant locations may happen to be similar. These events are categorized as distant false alarms in a later section on “Proximity Detection Performance Analysis.” Second, a user can add dummy measurements to his sorted vector so that multiple locations exist at which the untagged range measurements will be a subset of the sorted vector. In this article, we assume no dummy measurements added to sorted vectors. Furthermore, the untagged range measurements are ephemeral. The satellitetouser range is decreasing when the satellite is approaching and is increasing when the satellite is leaving. The range change rate varies with satellite elevation. For satellites at the zenith, the range rate is close to zero. For lowelevation satellites, the range rate can be as high as ±930 meters per second. Therefore, untagged range measurements are valid for Equation (3) (see inset photo, above right, for equations) If we choose t = 10 kilometers, then untagged range measurements are valid for approximately 10 seconds. Proximity Detection. Consider two users at locations x_{1} and x_{2}, as shown in Figure 1. Without loss of generality, assume user 2 is to the north of user 1. Suppose a GPS satellite is visible to both users, and the elevation and azimuth of the GPS satellite seen by user 1 are α and β. When the two users are nearby, the distance between the two users is much shorter than the distance to the satellite. Thus, we have the approximation r_{2} − r_{1} ≈ x_{2} − x_{1}_{2}cos α cos β. (4) Define a threshold distance t > 0. The two users are deemed “nearby” if x_{2} − x_{1}_{2 }≤_{ }t. Therefore, a necessary condition for two users to be nearby is r_{2} − r_{1} ≤ tcos α cos β ≤ t. (5) In our scheme, the server has to do blind matching of range measurements because the users do not designate PRN numbers to range measurements in order to protect their privacy. We formulate this “blind matching” problem in an optimization framework and propose an algorithm to solve it. Blind Matching As an Optimization Problem. Let ⊂ denote the subset relation between two sorted vectors. We write x ⊂ y if each element in x also belongs to y. Let card(x) denote the cardinality, i.e., number of elements, of a vector (or a set) x. The proximity detection problem can be formulated as the following optimization problem:
maximize c, where the infinity norm [u_{1}, . . . , u_{n}]^{T}_{∞} = max {u1, . . . , u_{n}}. The optimization problem maximizes c, the number of matched range measurements. The decision whether the two users are nearby depends on c, card(r_{1}), and card(r_{2}). In this article, we use a very simple criterion: two users are decided to be nearby if the match ratio Equation (7) where the decision threshold ζ, 0 ≤ ζ ≤ 1, is selected to achieve certain detection error performance. Efficient Blind Matching Algorithm. The optimization problem (6) is similar to the longest common subsequence (LCS) problem. Dynamic programming is often used to solve the LCS problem efficiently. Here we borrow this idea to solve the optimization problem. Let r^{(k)} denote the kth element in the sorted vector r, and let r[n] denote the vector of the first n elements, i.e., r[n] = [r^{(1)}, r^{(2)}, . . . , r^{(n)}]^{T}. Let c(r_{1}[k_{1}], r_{2}[k_{2}]) denote the maximum number of matched range measurements between r_{1}[k_{1}] and r_{2}[k_{2}]. We then have the following recursive property: Equation (8)
This algorithm achieves the worstcase time complexity of O(card(r_{1}) + card(r_{2})) and the space complexity of O(1). Once we obtain c using the previously described algorithm, we then use Equation (7) to determine if the two users are in proximity.
Proximity Detection Performance Analysis Equation (9) Equation (10) Equation (11) We focus our theoretical performance analysis on PFA for two reasons. First, PDE is dominated by PFA, as demonstrated by Equation (12)
where in general we have card(S \ X) >> card(X). We should note the existence of two types of false alarm:
In this article, nearby false alarm is not our major concern because “proximity” itself is a fuzzy concept in social networking. For example, if two users within a distance t are always deemed nearby, it is acceptable that two users within a larger distance (e.g., 1.5t) are detected to be nearby with a certain probability. The following analysis is about distant false alarm.
Probabilistic Model of Ranges. Here we use a uniform distribution to approximate the actual distribution of ranges, i.e, r ~ U(r_{min}, r_{max}). When the satellite elevation mask angle is set to 10 degrees, r_{min} ≈ 20,189 kilometers and r_{max} ≈ 24,619 kilometers. Let the spread of range measurements λ = r_{max} – r_{min}, and we have pdf_{r}(x) = 1/λ. A fundamental assumption of this analysis is that ranges to different satellites are independent and identically distributed (i.i.d.). The validity and efficacy of this assumption has been demonstrated in our previous work (L. Heng et alia). In this analysis, we ignore GPS range measurement errors because such errors are much less than the threshold distance. Probability of False Alarm. Suppose user 1 reports an untagged range vector r_{1} = [r_{1}^{(1)}, . . . , r_{1}^{(K1)}]^{T} and user 2 reports an untagged range vector r_{2} = [r_{2}^{(1)}, . . . , r_{2}^{(K2)}]^{T} Both users are randomly chosen on the Earth so that with a very high probability they are far apart. Let c denote the number of matched range measurements. According to our discussion in the section on “Proximity Detection,” false alarm occurs when the match ratio m = c/ min{K_{1},K_{2}} is grater than or equal to the threshold ζ. Let us randomly shuffle r_{2}. Based on our i.i.d. assumption mentioned earlier, r_{2}^{(k)} ∼ U(r_{min}, r_{max}) for all k = 1, . . . , K_{2}. The probability of r_{2}^{(1)} matching one of the elements of r_{1} is given by Equation (13) If r_{2}^{(1)} matches one of the elements of r_{1}, then the probability of r_{2}^{(2)} matching one of the remainder elements of r_{1} has a similar upper bound 2t(K_{1} – 1)/λ. Similarly, the ith match happens with a probability less than or equal to 2t(K_{1} + 1 – i)/λ. Let η = ζmin{K_{1}, K_{2}} be the required number of matched range measurements, where ⎡·⎤ is the ceiling function. Finally, the probability of at least η matched range measurements has the following upper bound: Equation (14) Since P_{FA} ≤ 1, we finally have the following upper bound: Equation (15) The equation shows that increasing λ (equivalent to using a lower mask angle) and/or decreasing threshold distance t will reduce the false alarm rate.
Experiment 1: Real Data from Global GPS Receiver Networks We treated the IGS and UNAVCO stations as nodes to test for proximity using the scheme outlined in the earlier section. These stations are usually very far (at least tens of kilometers) apart. With a proper distance threshold, the receivers at different stations can be seen as distant users, while the receivers at the same stations are nearby users. We applied our algorithm to the IGS data recorded on January 10, 2014. The pseudorange measurements released by 1,171 stations around the world at the start of the day during one time epoch was used to aggregate the statistics for validation. Figure 2 shows the variation of probability of false alarm with the threshold distance. In Figure 3, we see that the missed detection rate is below 0.05 for ζ ≤ 0.8. However, the false alarm rate is higher for lower values of ζ. This tradeoff is succinctly depicted in Figure 4, which plots the probability of detection P_{D} = 1 – P_{MD} versus P_{FA}, also known as the receiver operating characteristic (ROC) curve. Thus, the empirical results clearly illustrate the viability of our scheme for efficient private proximity detection. The results with real data demonstrate the robustness of our scheme. Occasional loss of satellites cause missed detection. With a proper choice of decision threshold ζ, we can still achieve satisfactory detection performance.
Experiment 2: Real Data from Android Phones We developed an Android application to log GPS range data. Upon postprocessing, we can evaluate the utility of our scheme. However, working with the Android API presents a fresh set of challenges. An additional evaluation using GPS receivers is presented to further strengthen the proof of concept. In an ideal scenario, an implementation of a proximitybased service using an Android app would just have the Android app interacting with a friendfinder server as shown in Figure 5. However, several challenges arose while working with the Android location application programming interface (API), which is the only mode of accessing the underlying GPS engine. We provide a description of the challenges and the suitable modifications required below. Pseudorange Measurements Not Available from the API. An app developer can only access geodetic coordinates (latitude, longitude, and height) information of an user. Thus, we had to set up an external server to continually download high rate ephemeris from nearby IGS stations. The high rate ephemeris within the last two hours from the stations NIST and GODS (shown in Figure 6) were downloaded and hosted on a server. The Android app downloaded and updated the ephemeris from the server periodically. Further, Android provides a GPSSatellite class which reports the satellites currently in view along with the elevation and azimuth. Using the last known location, the satellites in view and the downloaded ephemeris, we find the range measurements and form the anonymous range vector for proximity detection.
Unknown Clock Bias and Time Sync Issues. Initially, we manually configured each of the devices used for testing to not have time lag more than 5–10 seconds. However, the range measurements vary rapidly as the satellites are continually moving. In order to ensure time sync between users, the app was forced to report range measurements at fixed time epochs, i.e, integer multiples of 20 seconds.
Fluctuating Satellite Set.
Synchronous versus Asynchronous Implementation. The former implementation is obviously better in terms of privacy because the users give away less information. Also, there is the extra communication overhead of continuously reporting data in the synchronous implementation. However, we resort to a synchronous implementation in this work for simplicity. Further, we do not implement a friendfinder server as we are simply recording data for research purposes. The Android app just logs all the pseudorange data in a text file. Figure 7 shows the final implementation used for this experiment. We processed the files from all users after the recording to evaluate our algorithms. The app was distributed to six graduate students who lived and worked in UrbanaChampaign, Illinois. They primarily worked indoors and were close to each other for most part of the data collection. We present the results from this data collection in Figures 20, 21, and 22. Figure 8 shows the probability of correct decisions, false alarms, and missed detections. Unlike the IGS stations, most users in this experiment were close to each other for the most period. Hence, we see a relatively higher ratio of false alarms as the denominator in equation (9) is considerable smaller. Figures 9 and 10 show an interesting comparison of the number of mismatches between the anonymous range vectors of two users using two different threshold distance parameters t. We see that the results are consistent with what we expect.
Experiment 3: Real Data from GPS Receivers We thus decided to perform another experiment with small L1 singlefrequency GPS receivers known to output pseudorange measurements. Four graduate students from University of Illinois took part in this experiment. Two of them drove in opposite directions from the UrbanaChampaign campus for a few kilometers. Two others walked around on the campus. The paths of these users are presented in Figure 11. Three accompanying figures present the variation of match ratio with distance for t = 750, 1,500, and 5,000 meters, respectively, for a pair of users. From Figure 12, we can see that there is a sharp decrease in match ratio when the distance between the users increases more than the threshold of t = 750 meters. This demonstrates the robustness of the algorithm. For the same pair of users, as can be seen in Figure 13, a higher threshold of t = 1500 meters gives appropriate results in terms of match ratio. In Figure 14, the threshold t = 5,000 meters is always above the distance between the two users. As a result, the match ratio is close to 1.0 most of the time.
Conclusion We further conducted experiments using globally and locally collected data. The empirical results demonstrated the efficacy and robustness of our scheme for performing private proximity detection.
Acknowledgments
Additional Resources ManufacturersThe GPS receivers used to test the privacypreserving proximity detection method described in this article were LEA6T receivers from ublox, Thalwil, Switzerland.Copyright © 2017 Gibbons Media & Research LLC, all rights reserved. 
