mi

Best Practices for GTFS Publishers

Last Updated: April 11, 2017.

This document is for publishers of GTFS data and developers of real-time API services.

See the Help page for assistance using this Web service.

Words of Wisdom

Be Simple, Be Efficient, And Be Inviting...

-- The Webmaster

The purpose of specifications is to allow software to place reliable expectations on data so applications can be disigned for the best possible user experience. Nonsense is compliant usability is what matters. Be consistent with naming and identifiers across datasets and don't second guess design.

What's New

  • April 11, 2017. Separate display of arrival and departure times. The Arrival/Departure heading is tabbed and arrival and departure times are displayed separately if the time block is not equal. Specify arrival times for the new feature.
  • April 11, 2017. The parser is updated not to coerce a maximum of two trip_headsigns per schedule trip block when direction_id is used. The change avoids possibly misleading results when only the first trip_headsign is used. Use trip_headsign to model direction names identical to official publications. If trip_headsign changes along a trip, use stop_headsign to indicate the change. See GTFS Best Practices, Direction Names below.
  • February 7, 2017. See GTFS Best Practices, Other Best Practices #2 (for February 7, 2017).

Common Mistakes And Fixes (Updated April 11, 2017)

Here's a quick start to avoid common mistakes.

  1. Changing Data URL. Keep your latest data at the same URL (using the same file name). For example, use agency_name_gtfs.zip and NOT agency_name_gtfs_Dec2015.zip. Keeping the data URL the same means less confusion and is required for automatic updating.
  2. Calendar.txt Entries Don't Indicate All Times. The service_ids in calendar.txt should include trips for all times for each calendar entry. If you have a Weekday schedule and a Friday schedule, the Friday schedule should include all times for Friday. Otherwise, your timetables will not include all times. Use the specification to best reproduce official publications.
  3. Confusing Direction Names. The trip_headsign is used as the direction name. Too many can be confusing. Use direction names identical to official publications. If a trip headsign changes along a trip, use stop_headsign. It's designed for that purpose.
  4. Non-Unique route_ids Across Service Updates. The route_id is designated as the unique identifier for routes and should not change to denote a change in service. Publish data that supports tracking route schedules by keeping route_ids the same. If you have legacy software, update it.

    See Troubleshooting below for tips if you're locked into changing route_ids.

  5. GTFS Not a Snapshot of Current Service. Always publish the merged data file at your static URL as soon as updated data is available. GTFS is a cacheless specification and it's not possible to discern new schedules from intended updates to previous publications, so consuming applications require a snapshot to work as a rule. Otherwise, GTFS-R may stop working and times may appear missing or contradictory.
  6. Non-Compliant Data Files. Do not publish files with subfolders of datasets or partial agency data. Remember the purpose of the specification is to create data that can be fed directly into software. Your .zip file should contain only specification files that completely describe each agency in agency.txt. Otherwise, your data is non-compliant and it's worth additional time to make it usable.
  7. Separate route_ids For Each Direction. Tempting simplicity but it disrupts user interfaces designed for GTFS. Use the route_id, route_short_name, and route_long_name to identify routes and trip_headsign to identify direction. User interfaces were designed expecting this pattern, disrupting it is counter productive.
  8. Stop Codes Improperly Added Or Excluded. If your system uses public facing stop codes, include stop_code in stops.txt even if they are used as stop_id and don't append them to stop_name. The stop_code flags whether your system uses public stop codes and triggers useful features. If you use them, they're worth including.
  9. Real-Time Vendor Without Public API. If you contract out for real-time services, be sure your agreement includes access to real-time data through a publicly available API, i.e. an Open API. Otherwise, real-time information will be unavailable in familiar software.

GTFS Best Practices (Updated April 11, 2017)

Reproduce Printed Schedules

Use the same calendars, routes, direction names, and times in your data as you use in print. Identical print and data publications promote trust and use of your data. For an unsure public, it's a relief to find identical information.

Arrival and Departure Times

If arrival and departure times differ, specify them both. The public is interested in arrivals and not just departures.

Don't leave departure_time or arrival_time empty or times will be interpolated and results may differ among publications.

Calendars and Timetables

Use the preferred calendaring method to best reproduce official publications. If schedules do not vary by day too much, use calendar.txt with modifications in calendar_dates.txt. If schedules are highly variable by day, use only calendar_dates.txt.

The entries in calendar.txt are public timetables. To avoid inherent confusion, keep start_dates and end_dates continuous and without days overlapping. The public can easily select the correct timetable and view all times.

trips.txt must include all trips for each calendar entry or your timetable are nonsense.

To Support Timetables..

If you have Weekday and Monday service_ids, the Monday service_id should include all times for Monday. Otherwise, your Timetables will not display all times.

If you use calendar_dates.txt only, try to keep days of the week consistent within each service date range. The result is a user friendlier experience with fewer calender entries to choose from.

Direction Names

The trip_headsign is used as the direction name for a group of trips. Use the same names as official publications to create familiar direction names and time blocks.

Too many trip_headsigns can cause confusion because arrival and departure times get listed in separate time blocks. It's difficult to identify the next arrival or departure when times are listed under separate directions.

If trip headsigns vary along a trip, use stop_headsign. That's it's purpose.

We no longer coerce trip blocks into two trip_headsigns when direction_id is used. The change avoids possibly misleading results when only the first trip_headsign is used.

If you use direction_id, limit trip_headsigns to reflect official publications and use stop_headsign to reflect changes along a trip.

Support Tracking Route Schedules

If you change your route_ids, your data does not support tracking route schedules. The little utility left is a small return on your investment and you could benefit greatly by keeping them the same.

Remember, your data describes things in the real world and if they haven't changed neither should their identifiers.

If route_ids change, we identify routes between updates by route_id patterns or the route name on a per dataset basis using settings we assign. For example, 2-BMT-sj2-1 may be recognized as 2-BMT-sj2-. The route short name or complete route name may also be used. The route_id with the latest service period is used to interoperate with other systems.

If you're using legacy software that changes route_ids, update it if you can. Otherwise, your data works as a common pattern and not as a rule (very bad). If you must change route_ids, pay special attention to keep route_short_name and route_long_name the same.

Maintain A Snapshot

Keep your data up to date at the same URL and always a snapshot of current service. Since the GTFS specification is cacheless, maintaining a snapshot is the only rule that makes it possible for consuming applications to always work. This means your static URL should point to the merged data file.

Tips: Publish merged as soon as the data is ready so the public can see times as they become available.

Your data may be important to thousands of people each day and it may not be obvious. If your data location changes or doesn't describe current service, the result is wide-spread frustration.

Other Best Practices

  1. Ensure the data URL includes the last-modified and content-length headers in GET and HEAD requests. One of these, preferrably both, is required for updates to be detected and posted automatically.

    Note: If you want to restict access, use HTTPS with basic authentication. This allows restricted access while supporting automatic updates.

  2. Avoid frequent, last minute updates to a new file by posting the update only when it's done. Up-to-the-second or to-the-minute update detection is not feasible due to a small number of misbehaving servers. This means the absolute latest file is not guaranteed to be processed if it differ by only a few minutes.

    During the update process never allow the last-modified date of the old file to be later than the new file. If this happens, the new file may be ignored.

    Finally, don't edit your file while it's at the static URL. If bad data gets processed, nonsense can become widespread and not automatically undoable.

  3. Keep the GTFS data file valid and completely describing each agency in agency.txt. This means your static URL should point to the merged data file. Include only specification files and not subfolders of datasets or partial agency descriptions. Remember the whole purpose of the GTFS specification is to publish data that can be fed directly into software. Anything else means your data may be unusable and the time you spent compiling it wasted.

    If the GTFS data file is invalid or at any time doesn't describe current service, nonsense may result and your data may be excluded from this Service.

  4. If your system uses public facing stop codes, always include them as stop_code and not stop_id or appended to stop_name. We have no way of knowing if stop_id is stop_code parsimonious or exactly which part of stop_name is really stop_code. The presense of stop_code activates many useful features and is worth including.
  5. Follow good naming practices so route_short_name, route_long_name, and trip_headsign match their well known equivalents. If routes are published by a short name and a long name, always include trip_headsign so the public can tell which direction you're talking about.
  6. If you are coordinating GTFS for a metro area, don't merge unnecessarily if the result is exceedingly large. Processing can be resource expensive and unnecessary if only a few updates are needed. A single page with a list of download links is resource efficient and update flexible.

GTFS-Realtime Best Practices (Updated April 11, 2017)

Best practices follow logically from the specification but may not be obvious. Many are critical to avoiding nonsense.

GTFS And GTFS-R Always In Sync

Remember GTFS and GTFS-R are designed to work together and should always be a snapshot of your service. All data used by GTFS-R must always be contained in the dataset pointed to at the static GTFS URL.

Critical

If your static GTFS URL at any time does not contain data used by GTFS-R, expect real-time to stop working.

Always post new data in a merged dataset at your static URL that uses the same identifiers as your GTFS-R. Please remember that maintaining a snapshot is the only way software can track your schedules and work as a rule (GTFS is a cacheless specification).

If you run GTFS-R and publish a non-merged dataset early, the update will not be processed until the earliest date in calendar.txt or calendar_dates.txt of the new dataset (this is necessary to maintain GTFS and GTFS-R synchronization).

We have updated our system to track changing route_ids, but any such system must rely on heuristics and will therefore not work as a rule. If you run GTFS-R, update your software to generate data that does not change route_ids (and is therefore suitable for general application use).

Please remember to support the General Transit Feed Specication (GTFS).

Model your data as intended by the specficiation to include all trips in each calendar.txt entry with any modifications in calendar_dates.txt. If trips vary day to day, use the calendar dates only method.

Using the specification as intended means less work and a better user experience. See the GTFS documentation for the recommended approach.

Update .pb Files In A Background Process

The .pb files should be updated at an interval by a background service or process that is fully decoupled from GET or POST requests for the .pb file.

Updating .pb files in a separate process eliminates any latency issues. The service scales with high volume since no additional execution time is required per request.

The smaller the update interval the better the user experience, especially for vehicle locations. Use the smallest interval possible given your hardware resources.

For example, updating vehicle positions every 3-5 seconds allows users to watch and move from a sheltered area to the stop just in time.

Populate All Attributes

In general, populate all attributes if data is available. Code paths increase exponentially with each optional attribute and GTFS-R has many. You can promote quality software and the best possible user experience by simply populating all attributes.

Specifically, TripDescriptor should always include trip_id where available. Otherwise, scheduled times and direction may not be discernable and user experience will suffer in the extreme. In VehiclePositions, include stop_id, current_stop_sequence, and current_status so stop predictions and trip inquiries can resolve to vehicles to the extent possible. Also, include bearing for visual directional cues.

Don't Drop Past Events Too Soon

Don't drop past StopTimeEvents too soon. If so, users could see a trip status showing vehicle departure as scheduled 15 minutes ago and the vehicle just leaving the station.

Recognize 'Reply-To' Address

If your GTFS-R includes email auto-replies, be sure it responds to the 'Reply-To' address header if present. Anti-abuse configurations for email servers often reject email sent from a different domain, which means relayed requests can't be 'From' the user and responses must be sent to 'Reply-To'. If your system responds to the Reply-To address if present, mention it so feature support is known.

Publish Helpful Info

Publish other information that might be helpful to consuming applications, such as alert sources and types. It can be difficult to monitor publications to figure out what information is published where so it can be properly configured into a user interface. Also consider publishing information about the update frequency of vehicle locations and trip updates, non-abusive request frequencies, or any special or additional meaning attached to attributes.

Monitor

Successful services always make sense. Fine tune your system. Watch and compare to confirm it never displays nonsense or contradicts itself. Final testing is where work pays of the most.

Don't rely on the public to discover an outage. By that time there may be thousands of frustrated users. Maintain a notification system so you can discover outages and promptly restore service.

Other Real-Time Design Suggestions

/*

    GTFS-R is recommended as the emerging standard,
    but for RESTful services, here's some
    sample API requests using URL naming schemes
    with JSON/XML response
*/

/* stop arrivals/departures */
http://transitagency.com/realtime.html?route=[route_id|route_short_name]&stop=[stop_id|stop_code]&response=json|xml

/* trip updates */
http://transitagency.com/tripupdates.html?trip=[trip_id]&stop=[stop_id|stop_code|stop_sequence]&response=json|xml

/* alerts */
http://transitagency.com/alerts.html?agency=[agency_name|agency_id]&route=[route_id|route_short_name]&stop=[stop_id|stop_code]&response=json|xml

/* news */
http://transitagency.com/news.html?agency=[agency_name|agency_id]&route=[route_id|route_short_name]&stop=[stop_id|stop_code]&response=json|xml

/* vehicle locations */
http://transitagency.com/cgi-bin/wherenow.pl?agency=[agency_id|agency_name]&route=[route_id|route_short_name]&direction=[direction_id|trip_headsign]&stop=[stop_id|stop_code|stop_sequence]&trip=[trip_id]&vehicle=[vehicle_id]&response=json|xml

OUT:
XML or JSON

Aside from GTFS-Realtime, there is generally no uniform request or response format for stop predictions and vehicle locations. SIRI has recently enjoyed implementation by a few major providers but most often developers still need to write separate code to interact with the service. If your system is in development, here's some time saving suggestions.

  1. Use GTFS-Realtime Or RESTful Services. GTFS-Realtime is recommended and appearing for some major transit providers, though it can be difficult with non-mainstream dependencies and low level byte manipulation (though this is getting better). If a standard real-time request response format that is write once for all from the developer's perspective ever comes into being this is likely it. That means more bang for your work.

    XML and JSON are simple, mainstream, and reliable APIs, but not write once for all when it comes to predictions and vehicle locations. SIRI ("Service Interface for Real Time Information") is a template adopted by the European Commission on Standardization that offers at least the possibility of write once for all. However, in practice most implementations require custom code to interact with the service. You can learn more about SIRI from the official home page at http://www.siri.org.uk/.

    If you decide to develop your own API, be sure to include the DTD or Schema in the API documentation. Example responses are not enough. Developers need to know whether attributes are optional or required and whether they appear once or repeated as well as the data type and how errors are handled. Without this information, reponses cannot be parsed reliably. Do not use the HTTP layer to convey meaning within your API or software handling HTTP communications may not interact with your service as expected.

  2. Real-Time - GTFS Compatibility. Routes should map to route_id, stops to stop_id or stop_code and directions to direction_id or trip_headsign. If your real-time identifiers cannot be resolved from your GTFS data, your service will very likely not be implemented.
  3. Model Requests. Prediction requests should support per stop_id or stop_code and narrowing by route and direction. If you support vehicle locations, also by vehicle_id. Vehicle location requests should support narrowing by route, direction, and vehicle id with per stop inquiries handled by coordinates included in prediction responses. Alerts, notifications, advisories, news, etc. should support requests per stop, route, route type, and agency, if appropriate. Always support narrowing where possible for convenience and when data volumn is high.
  4. Model Responses. Minutes until arrival or departure is the most important response attribute for prediction requests. Epoch time in the local time zone as the date time format is the least ambiguous. Prediction responses should include attributes for route_id, trip_headsign, stop_id, stop_code, or vehicle_id and always vehicle coordinates if available. Responses to prediction requests based on vehicle id should return data for all future stops on the trip.

    Vehicle location responses should identify each vehicle by vehicle id, route, direction, and trip_id and include status and next stop information (early/late next stop X mins) and all available positional information, such as lat lons, speed, heading, etc.

  5. Include Human Readable Attribute Values Human readable values allow user request completion without additional queries or storage and can create a significant performance improvement.

    For example, if you specify next_stop_sequence or next_stop_id include next_stop_name. If you specify trip_id or direction_id, include route_short_name and trip_headsign.

  6. Email Auto-Replies: Popular anti-abuse configurations reject email messages not sent from the sender's domain. Auto-reply real-time requests must be sent from the senders domain and answered using the 'Reply To' email address. Be sure your system replies to 'Reply-To' if included in the request. Otherwise, the feature will not work and it will be turned off.
  7. Trip Updates. Trip updates give status information about an ongoing trip and may be particularly useful for long distance rail operators. We have seen few XML/JSON implementations, however. If you wish to support trip updates rather than predictions, GTFS-Realtime is the best choice. If you implement JSON/XML trip updates, requests must recognize the trip_id. Responses should include the same information as in the model vehicle locations response.
  8. RSS Advisories, News, Notifications, Etc. RSS (Really Simple Syndication) is convenient and effective for advisories, notifications, news, and other variable, non-GTFS information of interest to passengers. It enjoys wide-spread support in end user software and development APIs. We support immediate activation of RSS data sources. Use your GTFS identifiers with URL naming schemes to support convenient per route and per stop passenger inquiries. Of course, all requests should recognize identifiers in your GTFS data.

    Be sure your RSS files are indeed RSS! Frequently, software inserts custom tags and outputs non-compliant files. Select strict RSS as the output format. If your file is not valid RSS, do not publish it as such or it may be read incorrectly or not at all.

  9. Twitter. Twitter is often used as a convenient up to the minute alert and advisory service. It offers auto-updating lists immediately viewable on mobile devices and desktops with no quotas for developers. However, lists are chronological and can become confusing with size. Consider using Twitter for up to the minute, late-breaking information and RSS for more item based status information. Together, they offer an inexpensive alternative to an API.
submit 0 pjf928lhr6o7sb1ucheo85mep6 e82bc9c909ce1d4cd3f4d7f45566c55f en-us hlp_

Comments...

We are happy to hear from you!

  • Was this information helpful?
  • Corrections, suggestions, comments, other:
  • ▲ Some input is required.

  • Reply-To Email (Optional):
  • ▲ Invalid Reply-To Email address.

Submit Cancel

Please remember to include your email address if a reply is needed.