GPX is a well-liked XML format for working or biking tracks with geocoordinates. This can be a how-to for cleansing up a GPX file by eradicating undesirable or privacy-sensitive info.
Many apps that report exercise routes and may export them as GPX information embrace extra information than the plain GPS coordinates. As an example, a GPX file from my favourite recording app, Guru Maps, appears like this:
<?xml model="1.0" encoding="utf-8"?>
<gpx model="1.1" creator="Guru Maps/4.5.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/1"
xmlns:gom="https://gurumaps.app/gpx/v2"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd https://gurumaps.app/gpx/v2 https://gurumaps.app/gpx/v2/schema.xsd">
<trk>
<title>Barnimer Dörferweg</title>
<kind>TrackStyle_FF7F00C8</kind>
<trkseg>
<trkpt lat="52.6254614634" lon="13.4092010169">
<ele>54.238586451</ele>
<time>2020-05-10T05:30:38.997Z</time>
<hdop>4.6875</hdop>
<vdop>3.375</vdop>
<extensions>
<gom:pace>5.5661926282</gom:pace>
<gom:course>329.1938658731</gom:course>
</extensions>
</trkpt>
…
<!-- 1000's of observe factors -->
This observe consists of the next properties for every observe level:
- Geocoordinates (latitude and longitude)
- Elevation
- Timestamp
- Horizontal and vertical dilution of precision (hdop/vdop)
- Present pace
- Present course/heading
Plus, Guru Maps makes use of the observe’s <kind>
attribute to encode the colour of the observe as displayed within the app in a non-standardized format (TrackStyle_FF7F00C8
).
Some apps additionally embrace coronary heart fee or different health measurements.
All this information is helpful for archiving tracks or importing them into one other app. However earlier than sharing this observe publicly, I’d wish to clear the info up first:
- The one actually essential items of data are the coordinates and probably the elevation.
- Timestamps are non-public information. I don’t wish to share these.
- The opposite measurements are largely irrelevant.
GPX information can develop into fairly giant (1000’s of observe factors is frequent), so lowering the quantity of information can be good for file sizes and parsing efficiency.
Necessities
-
XmlStarlet
I exploit Xml to do many of the XML processing. On macOS, you’ll be able to set up XMLStarlet by way of Homebrew:
-
xmllint
One non-compulsory processing step makes use of xmllint, which comes preinstalled on macOS.
-
XSLT file for eradicating unused namespaces
Lastly, obtain this XSLT file
remove-unused-namespaces.xslt
, both from this Gist or from my server. We’re gonna use it in a single processing step to strip unused namespaces from the GPX file.Authentic supply: Dimitre Novatchev on Stack Overflow.
Operating the command
Assuming your supply file is called enter.gpx
and the XSLT file you downloaded above is within the present listing, that is the total command to course of the GPX file and save the end result to output.gpx
:
xmlstarlet ed
-d "//_:extensions"
-d "/_:gpx/_:metadata/_:time"
-d "/_:gpx/_:trk/_:kind"
-d "//_:trkpt/_:time"
-d "//_:trkpt/_:hdop"
-d "//_:trkpt/_:vdop"
-d "//_:trkpt/_:pdop"
-u "/_:gpx/@creator" -v "Shell script"
enter.gpx
| xmlstarlet tr remove-unused-namespaces.xslt -
| xmlstarlet ed -u "/_:gpx/@xsi:schemaLocation" -v "http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd"
| xmllint --c14n11 --pretty 2 -
> output.gpx
This sequence performs the next steps:
- Delete all
<extensions>
parts. - Delete the timestamp from the file’s
<metadata>
part if current. - Delete the
<trk><kind>
ingredient. - Delete the
<time>
,<hdop>
,<vdop>
, and<pdop>
parts from all observe factors. - Set the file’s
creator
attribute. - Now that extension fields are gone, take away all unused XML namespaces from the file header.
- Delete all
xsi:schemaLocation
entries besides the one for the GPX schema. -
Run the file via xmllint for formatting. The
--c14n11
choice performs XML Canonicalization (C14N). Amongst many different issues, canonicalization replaces numeric character entities within the XML with their regular Unicode characters, which is essential for my use case.For instance, the textual content “Dörferweg” within the supply would develop into “Dörferweg”. I discovered that a few of the instruments I exploit insert non-ASCII characters as numeric codes and different instruments don’t show these accurately.
The processed GPX file appears like this:
<gpx xmlns="http://www.topografix.com/GPX/1/1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
creator="Shell script" model="1.1"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
<trk>
<title>Barnimer Dörferweg</title>
<trkseg>
<trkpt lat="52.6254614634" lon="13.4092010169">
<ele>54.238586451</ele>
</trkpt>
<trkpt lat="52.6255090307" lon="13.4091548326">
<ele>53.9600219977</ele>
</trkpt>
…
The processing steps above are those that work for me given the apps I exploit. Your mileage might fluctuate in case your instruments add different information to your GPX information. Be at liberty to edit the command accordingly. XmlStarlet makes use of XPath syntax to pick which parts to function on. The xmlstarlet sel
command is helpful for inspecting a supply file and making an attempt out the required XPath incantations.
Validation
Lastly, it’s a good suggestion to validate the processed GPX file towards the official GPX schema:
xmlstarlet val --quiet --err --xsd
http://www.topografix.com/GPX/1/1/gpx.xsd
output.gpx
Blissful processing!
PS: Should you’re ever in Berlin, it is a good lengthy bike route (55 km) with minimal automobile site visitors. Begins and ends at Hauptbahnhof. Obtain the (sanitized) GPX file.