Best Practices for GTFS Publishers

Last Updated: February 7, 2017.

This document is for publishers of GTFS data and developers of real-time API services.

See the Help page for assistance using this Web service.

Words of Wisdom

Be Simple, Be Efficient, And Be Inviting...

-- The Webmaster

The purpose of specifications is to allow software to place reliable expectations on data so applications can be disigned for the best possible user experience. Nonsense is compliant usability is what matters. Be consistent with naming and identifiers across datasets and don't second guess design.

What's New

See GTFS Best Practices, Other Best Practices #2 (for February 7, 2017).

Common Mistakes And Fixes

Here's a quick start to avoid common mistakes.

  1. Changing Data URL. Keep your latest data at the same URL (using the same file name). For example, use agency_name_gtfs.zip and NOT agency_name_gtfs_Dec2015.zip. Keeping the data URL the same means less confusion and is required for automatic updating.
  2. Calendar.txt Entries Don't Indicate All Times. The service_ids in calendar.txt should include trips for all times for each calendar entry. If you have a Weekday schedule and a Friday schedule, the Friday schedule should include all times for Friday. Otherwise, your timetables will not include all times. Use the specification to best reproduce official publications.
  3. Non-Unique route_ids Across Service Updates. The route_id is designated as the unique identifier for routes and should not change to denote a change in service. Publish data that supports tracking route schedules by keeping route_ids the same. If you have legacy software, update it.

    See Troubleshooting below for tips if you're locked into changing route_ids.

  4. GTFS Not a Snapshot of Current Service. Always publish the merged data file at your static URL as soon as updated data is available. GTFS is a cacheless specification and it's not possible to discern new schedules from intended updates to previous publications, so consuming applications require a snapshot to work as a rule. Otherwise, GTFS-R may stop working and times may appear missing or contradictory.
  5. Non-Compliant Data Files. Do not publish files with subfolders of datasets or partial agency data. Remember the purpose of the specification is to create data that can be fed directly into software. Your .zip file should contain only specification files that completely describe each agency in agency.txt. Otherwise, your data is non-compliant and it's worth additional time to make it usable.
  6. Separate route_ids For Each Direction. Tempting simplicity but it disrupts user interfaces designed for GTFS. Use the route_id, route_short_name, and route_long_name to identify routes and trip_headsign to identify direction. User interfaces were designed expecting this pattern, disrupting it is counter productive.
  7. Stop Codes Improperly Added Or Excluded. If your system uses public facing stop codes, include stop_code in stops.txt even if they are used as stop_id and don't append them to stop_name. The stop_code flags whether your system uses public stop codes and triggers useful features. If you use them, they're worth including.
  8. Real-Time Vendor Without Public API. If you contract out for real-time services, be sure your agreement includes access to real-time data through a publicly available API, i.e. an Open API. Otherwise, real-time information will be unavailable in familiar software.

GTFS Best Practices (Updated February 7, 2017)

Reproduce Printed Schedules

Use the same calendars, routes, and times in your data as you use in print. Identical print and data publications promote trust and use of your data. For an unsure public, it's a relief to find identical information.

Don't leave departure_time or arrival_time empty or times will be interpolated and results may differ among publications.

Use the preferred calendaring method to best reproduce your schedules. If schedules do not vary by day too much, use calendar.txt with modifications in calendar_dates.txt. If schedules are highly variable by day, use only calendar_dates.txt.

To Support Timetables..

If you have Weekday and Monday service_ids, the Monday service_id should include all times for Monday. Otherwise, your Timetables will not display all times.

Note: We do not aggregate service_ids in calendar.txt for a specific date because doing so conflicts with the specification design. If you follow bad practices, we may have applied a temporary patch. If you stop, please email the Webmaster so we may adjust your settings.

Tip: If you use calendar_dates.txt only, keep days of the week consistent within each service date range. If you use calendar.txt, avoid overlapping start and end dates. The result is user friendly timetables and less confusion.

Support Tracking Route Schedules

If you change your route_ids, your data does not support tracking route schedules. The little utility left is a small return on your investment and you could benefit greatly by keeping them the same.

Remember, your data describes things in the real world and if they haven't changed neither should their identifiers.

If route_ids change, we identify routes between updates by route_id patterns or the route name on a per dataset basis using settings we assign. For example, 2-BMT-sj2-1 may be recognized as 2-BMT-sj2-. The route short name or complete route name may also be used. The route_id with the latest service period is used to interoperate with other systems.

If you're using legacy software that changes route_ids, update it if you can. Otherwise, your data works as a common pattern and not as a rule (very bad). If you must change route_ids, pay special attention to keep route_short_name and route_long_name the same.

Maintain A Snapshot

Keep your data up to date at the same URL and always a snapshot of current service. Since the GTFS specification is cacheless, maintaining a snapshot is the only rule that makes it possible for consuming applications to always work. This means your static URL should point to the merged data file.

Tips: Publish merged as soon as the data is ready so the public can see times as they become available.

Your data may be important to thousands of people each day and it may not be obvious. If your data location changes or doesn't describe current service, the result is wide-spread frustration.

Other Best Practices

  1. Ensure the data URL includes the last-modified and content-length headers in GET and HEAD requests. One of these, preferrably both, is required for updates to be detected and posted automatically.

    Note: If you want to restict access, use HTTPS with basic authentication. This allows restricted access while supporting automatic updates.

  2. Avoid frequent, last minute updates to a new file by posting the update only when it's done. Up-to-the-second or to-the-minute update detection is not feasible due to a small but significant number of misbehaving servers. An update conflicting with an in-progress detection or download can produce nonsense. To ensure updates are processed as expected, post when it's ready.

    During your update process never allow the last-modified date of the old dataset to be later then the new dataset or the new data will not be recognized as an update.

  3. Keep the GTFS data file valid and completely describing each agency in agency.txt. This means your static URL should point to the merged data file. Include only specification files and not subfolders of datasets or partial agency descriptions. Remember the whole purpose of the GTFS specification is to publish data that can be fed directly into software. Anything else means your data may be unusable and the time you spent compiling it wasted.

    If the GTFS data file is invalid or at any time doesn't describe current service, nonsense may result and your data may be excluded from this Service.

  4. If your system uses public facing stop codes, always include them as stop_code and not stop_id or appended to stop_name. We have no way of knowing if stop_id is stop_code parsimonious or exactly which part of stop_name is really stop_code. The presense of stop_code activates many useful features and is worth including.
  5. Follow good naming practices so route_short_name, route_long_name, and trip_headsign match their well known equivalents. If routes are published by a short name and a long name, always include trip_headsign so the public can tell which direction you're talking about.
  6. If you are coordinating GTFS for a metro area, don't merge unnecessarily if the result is exceedingly large. Processing can be resource expensive and unnecessary if only a few updates are needed. A single page with a list of download links is resource efficient and update flexible.

GTFS-Realtime Best Practices (Updated June 21, 2016)

Best practices follow logically from the specification but may not be obvious. Many are critical to avoiding nonsense.

GTFS And GTFS-R Always In Sync

Remember GTFS and GTFS-R are designed to work together and should always be a snapshot of your service. All data used by GTFS-R must always be contained in the dataset pointed to at the static GTFS URL.


If your static GTFS URL at any time does not contain data used by GTFS-R, expect real-time to stop working.

Always post new data in a merged dataset at your static URL that uses the same identifiers as your GTFS-R. Please remember that maintaining a snapshot is the only way software can track your schedules and work as a rule (GTFS is a cacheless specification).

If you run GTFS-R and publish a non-merged dataset early, the update will not be processed until the earliest date in calendar.txt or calendar_dates.txt of the new dataset (this is necessary to maintain GTFS and GTFS-R synchronization).

We have updated our system to track changing route_ids, but any such system must rely on heuristics and will therefore not work as a rule. If you run GTFS-R, update your software to generate data that does not change route_ids (and is therefore suitable for general application use).

Please remember to support the General Transit Feed Specication (GTFS).

Model your data as intended by the specficiation to include all trips in each calendar.txt entry with any modifications in calendar_dates.txt. If trips vary day to day, use the calendar dates only method.

Using the specification as intended means less work and a better user experience. See the GTFS documentation for the recommended approach.

Populate All Attributes

In general, populate all attributes if data is available. Code paths increase exponentially with each optional attribute and GTFS-R has many. You can promote quality software and the best possible user experience by simply populating all attributes.

Specifically, TripDescriptor should always include trip_id where available. Otherwise, scheduled times and direction may not be discernable and user experience will suffer in the extreme. In VehiclePositions, include stop_id, current_stop_sequence, and current_status so stop predictions and trip inquiries can resolve to vehicles to the extent possible. Also, include bearing for visual directional cues.

Don't Drop Past Events Too Soon

Don't drop past StopTimeEvents too soon. If so, users could see a trip status showing vehicle departure as scheduled 15 minutes ago and the vehicle just leaving the station.

Update Frequently

Update frequently to avoid appearing obsolete and prove you value on-time performance.

Update vehicle positions every 3-5 seconds so users can watch the vehicle approach and catch it in time. For example, passengers can move from a sheltered area to an outside stop just in time.

Recognize 'Reply-To' Address

If your GTFS-R includes email auto-replies, be sure it responds to the 'Reply-To' address header if present. Anti-abuse configurations for email servers often reject email sent from a different domain, which means relayed requests can't be 'From' the user and responses must be sent to 'Reply-To'. If your system responds to the Reply-To address if present, mention it so feature support is known.

Publish Helpful Info

Publish other information that might be helpful to consuming applications, such as alert sources and types. It can be difficult to monitor publications to figure out what information is published where so it can be properly configured into a user interface. Also consider publishing information about the update frequency of vehicle locations and trip updates, non-abusive request frequencies, or any special or additional meaning attached to attributes.


Don't rely on the public to discover an outage. By that time there may be thousands of frustrated users. Maintain a notification system so you can discover outages and promptly restore service.

Other Real-Time Design Suggestions


    GTFS-R is recommended as the emerging standard,
    but for RESTful services, here's some
    sample API requests using URL naming schemes
    with JSON/XML response

/* stop arrivals/departures */

/* trip updates */

/* alerts */

/* news */

/* vehicle locations */


Aside from GTFS-Realtime, there is generally no uniform request or response format for stop predictions and vehicle locations. SIRI has recently enjoyed implementation by a few major providers but most often developers still need to write separate code to interact with the service. If your system is in development, here's some time saving suggestions.

  1. Use GTFS-Realtime Or RESTful Services. GTFS-Realtime is recommended and appearing for some major transit providers, though it can be difficult with non-mainstream dependencies and low level byte manipulation (though this is getting better). If a standard real-time request response format that is write once for all from the developer's perspective ever comes into being this is likely it. That means more bang for your work.

    XML and JSON are simple, mainstream, and reliable APIs, but not write once for all when it comes to predictions and vehicle locations. SIRI ("Service Interface for Real Time Information") is a template adopted by the European Commission on Standardization that offers at least the possibility of write once for all. However, in practice most implementations require custom code to interact with the service. You can learn more about SIRI from the official home page at http://www.siri.org.uk/.

    If you decide to develop your own API, be sure to include the DTD or Schema in the API documentation. Example responses are not enough. Developers need to know whether attributes are optional or required and whether they appear once or repeated as well as the data type and how errors are handled. Without this information, reponses cannot be parsed reliably. Do not use the HTTP layer to convey meaning within your API or software handling HTTP communications may not interact with your service as expected.

  2. Real-Time - GTFS Compatibility. Routes should map to route_id, stops to stop_id or stop_code and directions to direction_id or trip_headsign. If your real-time identifiers cannot be resolved from your GTFS data, your service will very likely not be implemented.
  3. Model Requests. Prediction requests should support per stop_id or stop_code and narrowing by route and direction. If you support vehicle locations, also by vehicle_id. Vehicle location requests should support narrowing by route, direction, and vehicle id with per stop inquiries handled by coordinates included in prediction responses. Alerts, notifications, advisories, news, etc. should support requests per stop, route, route type, and agency, if appropriate. Always support narrowing where possible for convenience and when data volumn is high.
  4. Model Responses. Minutes until arrival or departure is the most important response attribute for prediction requests. Epoch time in the local time zone as the date time format is the least ambiguous. Prediction responses should include attributes for route_id, trip_headsign, stop_id, stop_code, or vehicle_id and always vehicle coordinates if available. Responses to prediction requests based on vehicle id should return data for all future stops on the trip.

    Vehicle location responses should identify each vehicle by vehicle id, route, direction, and trip_id and include status and next stop information (early/late next stop X mins) and all available positional information, such as lat lons, speed, heading, etc.

  5. Include Human Readable Attribute Values Human readable values allow user request completion without additional queries or storage and can create a significant performance improvement.

    For example, if you specify next_stop_sequence or next_stop_id include next_stop_name. If you specify trip_id or direction_id, include route_short_name and trip_headsign.

  6. Email Auto-Replies: Popular anti-abuse configurations reject email messages not sent from the sender's domain. Auto-reply real-time requests must be sent from the senders domain and answered using the 'Reply To' email address. Be sure your system replies to 'Reply-To' if included in the request. Otherwise, the feature will not work and it will be turned off.
  7. Trip Updates. Trip updates give status information about an ongoing trip and may be particularly useful for long distance rail operators. We have seen few XML/JSON implementations, however. If you wish to support trip updates rather than predictions, GTFS-Realtime is the best choice. If you implement JSON/XML trip updates, requests must recognize the trip_id. Responses should include the same information as in the model vehicle locations response.
  8. RSS Advisories, News, Notifications, Etc. RSS (Really Simple Syndication) is convenient and effective for advisories, notifications, news, and other variable, non-GTFS information of interest to passengers. It enjoys wide-spread support in end user software and development APIs. We support immediate activation of RSS data sources. Use your GTFS identifiers with URL naming schemes to support convenient per route and per stop passenger inquiries. Of course, all requests should recognize identifiers in your GTFS data.

    Be sure your RSS files are indeed RSS! Frequently, software inserts custom tags and outputs non-compliant files. Select strict RSS as the output format. If your file is not valid RSS, do not publish it as such or it may be read incorrectly or not at all.

  9. Twitter. Twitter is often used as a convenient up to the minute alert and advisory service. It offers auto-updating lists immediately viewable on mobile devices and desktops with no quotas for developers. However, lists are chronological and can become confusing with size. Consider using Twitter for up to the minute, late-breaking information and RSS for more item based status information. Together, they offer an inexpensive alternative to an API.
submit 0 m4ssnakk7a40nd549ec7bhtfu2 99143bc13c22323e00d317e6c31a3864 en-us hlp_


We are happy to hear from you!

  • Was this information helpful?
  • Corrections, suggestions, comments, other:
  • Reply Email (Optional):




m4ssnakk7a40nd549ec7bhtfu2 99143bc13c22323e00d317e6c31a3864

Comments? Suggestions? Error Reports?

Please provide all the information we will need to address any issues or make corrections.

▲ Message is required

Reply Email (Optional):

▲ Invalid email address