Best Practices for GTFS Publishers

Last Updated: April 21, 2023.

This document is for publishers of GTFS data and developers of real-time API services.

See the Help page for assistance using this Web service.

Words of Wisdom

Be Simple, Be Efficient, And Be Inviting...

-- The Webmaster

The purpose of specifications is to allow software to place reliable expectations on data so applications can be disigned for the best possible user experience. Nonsense is compliant usability is what matters. Be consistent with naming and identifiers across datasets and don't second guess design.

What's New

  • April 21, 2023. Support for multi-agency GTFS-R alert files.
  • April 21, 2023. Normal update of datasets with unrealistically long date ranges.
  • April 1, 2023. Minor fixes in display of agency information.
  • September 16, 2019. GTFS-R module update for improved performance.
  • April 29, 2019. System map images desktop and mobile optimized for the device.
  • April 11, 2019. Coercion to user friendly timetables where calendar.txt entries overlap or don't include all trips for the service period. To see the full effect of this change, include current schedules in your update. (Note: coercion is based on continuous service days and service_id.)
  • April 11, 2017. Separate display of arrival and departure times. The Arrival/Departure heading is tabbed and arrival and departure times are displayed separately if the time block is not equal. Specify arrival times for the new feature.
  • April 11, 2017. The parser is updated not to coerce a maximum of two trip_headsigns per schedule trip block when direction_id is used. The change avoids possibly misleading results when only the first trip_headsign is used. Use trip_headsign to model direction names identical to official publications. If trip_headsign changes along a trip, use stop_headsign to indicate the change. See GTFS Best Practices, Direction Names below.
  • February 7, 2017. See GTFS Best Practices, Other Best Practices #2 (for February 7, 2017).

GTFS Tips & Guidance (Updated April 11, 2019)

Here's a quick start to avoid common mistakes.

  1. Changing Data URL. Keep your latest data at the same HTTP(S) URL (using the same file name). For example, use agency_name_gtfs.zip and NOT agency_name_gtfs_Dec2015.zip. Keeping the data URL the same means less confusion and is required for automatic updating. (Avoid FTP because many software libraries are not updated for current encryption.)
  2. Confusing Direction Names. The trip_headsign is used as the direction name. Too many can be confusing. Use direction names identical to official publications. If a trip headsign changes along a trip, use stop_headsign. It's designed for that purpose.
  3. Non-Unique route_ids Across Service Updates. The route_id is designated as the unique identifier for routes and should not change to denote a change in service. Publish data that supports tracking route schedules by keeping route_ids the same. If you have legacy software, update it.

    See Support Tracking Route Schedules (GTFS Best Practices) below for tips if you're locked into changing route_ids.

  4. GTFS Not a Snapshot of Current Service. Always maintaining a snapshot ensures a seamless transition between iterations regardless of how applications treat data caching. If you use GTFS-R, the static URL must always point to the data in use by the real-time system.
  5. Misconfigured Server. When the content-length and last-modified headers don't exist, are incorrect, or fluctuate, updates to the data are not processed timely and unnecessary downloads can result. When automated processes regenerate data, the resulting file size is unlikely to be size identical after compression. Avoid scheduled processes that regenerate the file even when the data is unchanged.
  6. Non-Compliant Data Files. Do not publish files with subfolders of datasets or partial agency data. Remember the purpose of the specification is to create data that can be fed directly into software. Your .zip file should contain at least specification files that completely describe each agency in agency.txt. Otherwise, your data is non-compliant and it's worth additional time to make it usable.
  7. Separate route_ids For Each Direction. Tempting simplicity but it disrupts user interfaces designed for GTFS. Use the route_id, route_short_name, and route_long_name to identify routes and trip_headsign to identify direction. User interfaces were designed expecting this pattern, disrupting it is counter productive.
  8. Stop Codes Improperly Added Or Excluded. If your system uses public facing stop codes, include stop_code in stops.txt even if they are used as stop_id and don't append them to stop_name. The stop_code flags whether your system uses public stop codes and triggers useful features. If you use them, they're worth including.
  9. Real-Time Vendor Without Public API. If you contract out for real-time services, be sure your agreement includes access to real-time data through a publicly available API, i.e. an Open API. Otherwise, real-time information will be unavailable in familiar software.

GTFS Best Practices (Updated April 11, 2019)

Reproduce Printed Schedules

Use the same calendars, routes, direction names, and times in your data as you use in print. Identical print and data publications promote trust and use of your data. For an unsure public, it's a relief to find identical information.

Arrival and Departure Times

If arrival and departure times differ, specify them both. The public is interested in arrivals and not just departures.

Don't leave departure_time or arrival_time empty or times will be interpolated and results may differ among publications.

Calendars and Timetables

Use the preferred calendaring method to best reproduce official publications. If schedules do not vary by day too much, use calendar.txt with modifications in calendar_dates.txt. If schedules are highly variable by day, use only calendar_dates.txt.

If you use calendar_dates.txt only, try to keep days of the week consistent within each service date range. The result is a user friendlier experience with fewer calender entries to choose from.

Direction Names

The trip_headsign is used as the direction name for a group of trips. Use the same names as official publications to create familiar direction names and time blocks.

Too many trip_headsigns can cause confusion because arrival and departure times get listed in separate time blocks. It's difficult to identify the next arrival or departure when times are listed under separate directions.

If trip headsigns vary along a trip, use stop_headsign. That's it's purpose.

We no longer coerce trip blocks into two trip_headsigns when direction_id is used. The change avoids possibly misleading results when only the first trip_headsign is used.

If you use direction_id, limit trip_headsigns to reflect official publications and use stop_headsign to reflect changes along a trip.

Use Timepoints

Make the major stops that appear in print publications an exact timepoint so the publication can be reproduced without a tomb of stops. See the reference publication for timepoint guidance.

Support Tracking Route Schedules

If you change your route_ids, your data does not support tracking route schedules. The little utility left is a small return on your investment and you could benefit greatly by keeping them the same.

Remember, your data describes things in the real world and if they haven't changed neither should their identifiers.

If route_ids change, we identify routes between updates by route_id patterns or the route name on a per dataset basis using settings we assign. For example, 2-BMT-sj2-1 may be recognized as 2-BMT-sj2-. The route short name or complete route name may also be used. The route_id with the latest service period is used to interoperate with other systems.

If you're using legacy software that changes route_ids, update it if you can. Otherwise, your data works as a common pattern and not as a rule (very bad). If you must change route_ids, pay special attention to keep route_short_name and route_long_name the same.

Other Best Practices

  1. Ensure the data URL includes the last-modified and content-length headers in GET and HEAD requests. One of these, preferrably both, is required for updates to be detected and posted automatically.

    Note: If you want to restict access, use HTTPS with basic authentication. This allows restricted access while supporting automatic updates.

  2. Avoid frequent, last minute updates to a new file by posting the update only when it's done. Up-to-the-second or to-the-minute update detection is not feasible due to a small number of misbehaving servers. This means the absolute latest file is not guaranteed to be processed if it differ by only a few minutes.

    During the update process never allow the last-modified date of the old file to be later than the new file. If this happens, the new file may be ignored.

    Finally, don't edit your file while it's at the static URL. If bad data gets processed, nonsense can become widespread and not automatically undoable.

  3. Keep the GTFS data file valid and completely describing each agency in agency.txt. This means your static URL should point to the merged data file. Include only specification files and not subfolders of datasets or partial agency descriptions. Remember the whole purpose of the GTFS specification is to publish data that can be fed directly into software. Anything else means your data may be unusable and the time you spent compiling it wasted.

    If the GTFS data file is invalid or at any time doesn't describe current service, nonsense may result and your data may be excluded from this Service.

  4. If your system uses public facing stop codes, always include them as stop_code and not stop_id or appended to stop_name. We have no way of knowing if stop_id is stop_code parsimonious or exactly which part of stop_name is really stop_code. The presense of stop_code activates many useful features and is worth including.
  5. Follow good naming practices so route_short_name, route_long_name, and trip_headsign match their well known equivalents. If routes are published by a short name and a long name, always include trip_headsign so the public can tell which direction you're talking about.
  6. If you are coordinating GTFS for a metro area, don't merge unnecessarily if the result is exceedingly large. Processing can be resource expensive and unnecessary if only a few updates are needed. A single page with a list of download links is resource efficient and update flexible.

GTFS-Realtime Best Practices (Updated April 11, 2019)

Best practices follow logically from the specification but may not be obvious. Many are critical to avoiding nonsense.

GTFS And GTFS-R Always In Sync

Remember GTFS and GTFS-R are designed to work together and should always be a snapshot of your service. All data used by GTFS-R must always be contained in the dataset pointed to at the static GTFS URL.

Critical

If your static GTFS URL at any time does not contain data used by GTFS-R, expect real-time to stop working.

Always post new data in a merged dataset at your static URL that uses the same identifiers as your GTFS-R. Please remember that maintaining a snapshot is the only way software can track your schedules and work as a rule (GTFS is a cacheless specification).

If you run GTFS-R and publish a non-merged dataset early, the update will not be processed until the earliest date in calendar.txt or calendar_dates.txt of the new dataset (this is necessary to maintain GTFS and GTFS-R synchronization).

We have updated our system to track changing route_ids, but any such system must rely on heuristics and will therefore not work as a rule. If you run GTFS-R, update your software to generate data that does not change route_ids (and is therefore suitable for general application use).

Please remember to support the General Transit Feed Specication (GTFS).

Model your data as intended by the specficiation to include all trips in each calendar.txt entry with any modifications in calendar_dates.txt. If trips vary day to day, use the calendar dates only method.

Using the specification as intended means less work and a better user experience. See the GTFS documentation for the recommended approach.

Update .pb Files In A Background Process

The .pb files should be updated at an interval by a background service or process that is fully decoupled from GET or POST requests for the .pb file.

Updating .pb files in a separate process eliminates any latency issues. The service scales with high volume since no additional execution time is required per request.

The smaller the update interval the better the user experience, especially for vehicle locations. Use the smallest interval possible given your hardware resources.

For example, updating vehicle positions every 3-5 seconds allows users to watch and move from a sheltered area to the stop just in time.

Populate All Attributes

In general, populate all attributes if data is available. Code paths increase exponentially with each optional attribute and GTFS-R has many. You can promote quality software and the best possible user experience by simply populating all attributes.

Specifically, TripDescriptor should always include trip_id where available. Otherwise, scheduled times and direction may not be discernable and user experience will suffer in the extreme. In VehiclePositions, include stop_id, current_stop_sequence, and current_status so stop predictions and trip inquiries can resolve to vehicles to the extent possible. Also, include bearing for visual directional cues.

Post TripUpdate for Vehicles

If you follow best practices, treat a vehicle assignment as a trip update so predictions can be coupled to the vehicle even when it is "on time". It's also a pleasure to see the system is up and working and the trip is "on time".

Don't Drop Past Events Too Soon

Don't drop past StopTimeEvents too soon. If so, users could see a trip status showing vehicle departure as scheduled 15 minutes ago and the vehicle just leaving the station.

Recognize 'Reply-To' Address

If your GTFS-R includes email auto-replies, be sure it responds to the 'Reply-To' address header if present. Anti-abuse configurations for email servers often reject email sent from a different domain, which means relayed requests can't be 'From' the user and responses must be sent to 'Reply-To'. If your system responds to the Reply-To address if present, mention it so feature support is known.

Publish Helpful Info

Publish other information that might be helpful to consuming applications, such as alert sources and types. It can be difficult to monitor publications to figure out what information is published where so it can be properly configured into a user interface. Also consider publishing information about the update frequency of vehicle locations and trip updates, non-abusive request frequencies, or any special or additional meaning attached to attributes.

Monitor

Successful services always make sense. Fine tune your system. Watch and compare to confirm it never displays nonsense or contradicts itself. Final testing is where work pays of the most.

Don't rely on the public to discover an outage. By that time there may be thousands of frustrated users. Maintain a notification system so you can discover outages and promptly restore service.

GTFS Extensions (April 21, 2023

To Publish More Info...

To publish more than what's in the specification, maximize application support by doing it the same way others do.

At present, most efforts to go beyond the basic specification focus on:

  • Stop attributes
  • Fare info
  • Route types, and
  • On demand travel

Many GTFS publishing services implement GTFS-Fares, GTFS-Flex, GTFS-ContinousStops and other extensions for stops, pathways, and facilities. Do not break the specification by publishing separate GTFS files for your on demand service. The specification is updated to describe on demand service--use it to your advantage. You can see the new files and attributes and official documentation for GTFS-Flex.

We've attempted to fully support these extensions but new ones arise. As of now, GTFS-Flex and GTFS-Fares have partial support with complete support in progress. Periodally we evaluate support of all bona fide GTFS extensions. If you've implemented a non-standard solution, you are encouraged to migrate to these new popular formats to describe your services.

The dominate classification for additional route types is the Hierarchical Vehicle Type (HVT) codes from the European TPEG. You can see a list of the codes and their support by Google here. However, new route_types will create backward incompatibilies into the future. Future support at RideSchedules depends on the extent to which the codes are incoporated into publications.

Example: Senior, Youth, Disabled Fares

To include fares for different types of passengers, include a rider_categories.txt file and fare_rider_categories.txt in addition to the fare attributes and fare rules.

The rider_categories.txt files assigns a unique integer to a string public name.

rider_category_id,rider_category_description
2,Senior
6,Disabled
15,K-12

The fare_rider_categories.txt file keys the rider_category_id to a fare_id and a separate price.

fare_id,rider_category_id,price
2491,2,0.35
2491,6,0.35
2491,15,0.70

The above example results in:

$1.00 upon boarding (includes 1 transfer), Senior $0.35, Disabled $0.35, and K-12 $0.70.

Note: Always specify route_id in fare_rules.txt so the fare can be associated with a route. At present, that means an entry in fare_rules.txt for each route_id.

Other Files and Attributes of Note

Disambiguting the location of a stop can be done with the cardinal_direction and cardinal_position fields in stop_attributes.txt or by using the vender specific method of direction and position in stops.txt. You can create results such as: "Stop located Fareside traffic direction South". The stop_address in stops.txt and stop_city in stop_attributes.txt is also available.

You can specify the name of fare zones in a way familiar to the public by including farezone_attributes.txt with zone_id keyed to a zone_name.

Passengers using GTFS frequently need non-GTFS information, such as system maps, rider guides, fare brochures, etc.

Note: As of this writing, the mobile segment currently has insufficient resources for system maps created from GTFS data but that's changing slowly.

Follow a few simple rules to keep your resources visible and up to date.

  • Keep URLs to the up to date resources the same. If you publish early, use an "Upcoming" or "Planned" URL (which can also be kept the same) before the effective date.
  • Optimize your resources. Some freeware reduces the size of .png and .jpg images significantly (jpegoptim and optipng are popular). If you publish PDFs, consider the optimize features in your editing software.

Other Real-Time Design Suggestions

/*

    GTFS-R is recommended as the emerging standard,
    but for RESTful services, here's some
    sample API requests using URL naming schemes
    with JSON/XML response
*/

/* stop arrivals/departures */
http://transitagency.com/realtime.html?route=[route_id|route_short_name]&stop=[stop_id|stop_code]&response=json|xml

/* trip updates */
http://transitagency.com/tripupdates.html?trip=[trip_id]&stop=[stop_id|stop_code|stop_sequence]&response=json|xml

/* alerts */
http://transitagency.com/alerts.html?agency=[agency_name|agency_id]&route=[route_id|route_short_name]&stop=[stop_id|stop_code]&response=json|xml

/* news */
http://transitagency.com/news.html?agency=[agency_name|agency_id]&route=[route_id|route_short_name]&stop=[stop_id|stop_code]&response=json|xml

/* vehicle locations */
http://transitagency.com/cgi-bin/wherenow.pl?agency=[agency_id|agency_name]&route=[route_id|route_short_name]&direction=[direction_id|trip_headsign]&stop=[stop_id|stop_code|stop_sequence]&trip=[trip_id]&vehicle=[vehicle_id]&response=json|xml

OUT:
XML or JSON

Aside from GTFS-Realtime, there is generally no uniform request or response format for stop predictions and vehicle locations. SIRI has recently enjoyed implementation by a few major providers but most often developers still need to write separate code to interact with the service. If your system is in development, here's some time saving suggestions.

  1. Use GTFS-Realtime Or RESTful Services. GTFS-Realtime is recommended and appearing for some major transit providers, though it can be difficult with non-mainstream dependencies and low level byte manipulation (though this is getting better). If a standard real-time request response format that is write once for all from the developer's perspective ever comes into being this is likely it. That means more bang for your work.

    XML and JSON are simple, mainstream, and reliable APIs, but not write once for all when it comes to predictions and vehicle locations. SIRI ("Service Interface for Real Time Information") is a template adopted by the European Commission on Standardization that offers at least the possibility of write once for all. However, in practice most implementations require custom code to interact with the service. You can learn more about SIRI from the official home page at http://www.siri.org.uk/.

    If you decide to develop your own API, be sure to include the DTD or Schema in the API documentation. Example responses are not enough. Developers need to know whether attributes are optional or required and whether they appear once or repeated as well as the data type and how errors are handled. Without this information, reponses cannot be parsed reliably. Do not use the HTTP layer to convey meaning within your API or software handling HTTP communications may not interact with your service as expected.

  2. Real-Time - GTFS Compatibility. Routes should map to route_id, stops to stop_id or stop_code and directions to direction_id or trip_headsign. If your real-time identifiers cannot be resolved from your GTFS data, your service will very likely not be implemented.
  3. Model Requests. Prediction requests should support per stop_id or stop_code and narrowing by route and direction. If you support vehicle locations, also by vehicle_id. Vehicle location requests should support narrowing by route, direction, and vehicle id with per stop inquiries handled by coordinates included in prediction responses. Alerts, notifications, advisories, news, etc. should support requests per stop, route, route type, and agency, if appropriate. Always support narrowing where possible for convenience and when data volumn is high.
  4. Model Responses. Minutes until arrival or departure is the most important response attribute for prediction requests. Epoch time in the local time zone as the date time format is the least ambiguous. Prediction responses should include attributes for route_id, trip_headsign, stop_id, stop_code, or vehicle_id and always vehicle coordinates if available. Responses to prediction requests based on vehicle id should return data for all future stops on the trip.

    Vehicle location responses should identify each vehicle by vehicle id, route, direction, and trip_id and include status and next stop information (early/late next stop X mins) and all available positional information, such as lat lons, speed, heading, etc.

  5. Include Human Readable Attribute Values Human readable values allow user request completion without additional queries or storage and can create a significant performance improvement.

    For example, if you specify next_stop_sequence or next_stop_id include next_stop_name. If you specify trip_id or direction_id, include route_short_name and trip_headsign.

  6. Email Auto-Replies: Popular anti-abuse configurations reject email messages not sent from the sender's domain. Auto-reply real-time requests must be sent from the senders domain and answered using the 'Reply To' email address. Be sure your system replies to 'Reply-To' if included in the request. Otherwise, the feature will not work and it will be turned off.
  7. Trip Updates. Trip updates give status information about an ongoing trip and may be particularly useful for long distance rail operators. We have seen few XML/JSON implementations, however. If you wish to support trip updates rather than predictions, GTFS-Realtime is the best choice. If you implement JSON/XML trip updates, requests must recognize the trip_id. Responses should include the same information as in the model vehicle locations response.
  8. RSS Advisories, News, Notifications, Etc. RSS (Really Simple Syndication) is convenient and effective for advisories, notifications, news, and other variable, non-GTFS information of interest to passengers. It enjoys wide-spread support in end user software and development APIs. We support immediate activation of RSS data sources. Use your GTFS identifiers with URL naming schemes to support convenient per route and per stop passenger inquiries. Of course, all requests should recognize identifiers in your GTFS data.

    Be sure your RSS files are indeed RSS! Frequently, software inserts custom tags and outputs non-compliant files. Select strict RSS as the output format. If your file is not valid RSS, do not publish it as such or it may be read incorrectly or not at all.

  9. Twitter. Twitter is often used as a convenient up to the minute alert and advisory service. It offers auto-updating lists immediately viewable on mobile devices and desktops with no quotas for developers. However, lists are chronological and can become confusing with size. Consider using Twitter for up to the minute, late-breaking information and RSS for more item based status information. Together, they offer an inexpensive alternative to an API.

Known Issues (Updated April 21, 2023)

All behavior is as designed to the extent known.

Note:

  • Multi-agency GTFS-R Alerts without a limiting agency query parameter is now supported. If you notice your file not properly displaying, notify the Webmaster.

  • Static GTFS with date ranges covering many decades now auto-update normally. The displayed date ranges will roll a sensible date range without re-fetch until the data file expiration date.

submit 0 rlno976kmp6u38gcbltenbl137 cb7706c8a5e0ac042dd04bc2c8491199 en-us hlp_

Comments...

We are happy to hear from you!

  • Was this information helpful?
  • ▲ Some input is required.

  • ▲ Invalid Reply-To Email address.

Submit Cancel

Please remember to include your email address if a reply is needed.