Best practices for segmenting your weather queries

When you have a large weather query, it is typically a good idea to segment it into multiple, smaller queries and then execute those separate individually. This is important for more consistent timing performance inside your application, debugging, and timeout avoidance among under reasons. In this article, we will discuss when it makes sense to segment a large weather query, why it is a good idea, and how to actually do it.

What is query segmentation?

Query segmentation is simply the process of taking a complex or long-running query and breaking it into multiple smaller, quicker queries that can be run independently. A very common example of this in the weather API would be taking a simple query that requests data for 100 locations and breaking it into 10 queries that each request data for 10 locations. You can also do segmentation on the time dimension. If you have a single query that requests data for 20 years, you could break it into 4 queries that each request data for 5 years. In both examples, the change would be as easy as changing a query parameter or two. In the first example, you could just change the “locations” parameter while in the second, you would change the “startDateTime” and “endDateTime” parameters.

When should I segment a weather query?

Any time that you have a large weather query, it is always better to be on the safe side and segment it. The simple reason is that the cost to do so is low and the benefits far outweigh the effort as we’ll explain below. A common question is what is considered a “large” query. The primary way to measure “largeness” is by the number of records that the query will return. Using this measure, you will want to consider segmenting any query that is expected to return more than 10,000 records. For reference, this is a little more than one year of hourly weather reports or more than 25 years of daily reports for a given location.

If your query is primarily focused on many locations, then you likely want to consider segmenting your query as you approach 100 location. As you will see below, having many locations in the query URL makes the submission process more complex, and that alone can be a good reason to segment the query.

It is important to note here that not all queries have equal cost. The least costly queries are forecasts. In fact, the weather server can typically service forecast queries entirely from memory cache, and this makes them extremely fast. So if you are primarily segmenting queries to avoid timeouts and uneven query performance, you have additional flexibility when running forecast queries.

The most expensive weather queries are queries such as historical summaries and growing degree days where the weather engine needs to examine hundreds or thousands of records for each result. In nearly all cases you should consider segmenting these types of queries anytime you are querying more than one location. The best practice in this case is to run one query per location.

What are the benefits of query segmentation?

There are several key benefits to segmenting your more complex weather queries. Here we’ll describe some of they key ones and explain how they can benefit you.

One of the most important benefits of running smaller, more simple queries is their natural ability to avoid query timeouts. More complex weather queries that hit many locations or long time ranges naturally take longer to run. This is because the weather engine may have to hit records for hundreds or thousands of stations to obtain your final results.

It is important to remember that timeouts can happen on several different levels when running a weather API query. One level is within the querying application’s network layer. Network libraries typically have a built-in timeout which can often be adjusted using parameters. The next level is the network local to the querying application. Depending upon the local networking infrastructure, this timeout is typically caused by a device such as a proxy server, VPN, or corporate internet gateway. As a user, usually you have very little control over the configuration of these devices and timeouts.

Another potential timeout level is at the Visual Crossing Weather server itself and our load balancing system. Although you should avoid making assumptions about the exact server configuration, the timeout at the level is currently set to 120 seconds. So this is another level that can reject and fail your query if it runs for too long. Since avoiding a timeout at every level of the network chain is critical to query success, segmenting large queries into small queries that run well under every timeout in the stack is important.

Similarly, consistent query performance is useful in many applications. For example, having a combination of slow and fast queries underlying the same user interface can cause that UI to seem unpredictably sluggish and frustrate users. This is another cases where segmenting queries is an important solution. Smaller, consistently performing queries, can make an application easier to implement and more friendly to use.

Another reason to segment long weather queries is for debugging. When building a complex query, it can be easy to make a mistake. A simpler query is easier to build and less error prone. It is also easier to look at a simple query URL and understand how it works and what it is expected to return. However, a large query can return many thousands of unexpected records in a single run. So if you make a mistake, you can end up wasting thousands of results from your plan’s limit. A simpler query naturally limits the consequences from a bug.

One more reason for segmenting weather queries is easy of query execution. Since weather queries are executed by submitting a URL, there are two fundamental ways to to run the query. The most common is via an HTTPS GET query. This type of query packs the entire set of query parameters onto the URL itself and then submits that URL over the network. However, the length of URLs is limited to about 2000 characters. (The hard limit is 2048 but buffering adjustments and other corner-cutting across internet technologies and over time means that it is a good idea to allow some margin in your URLs. If you max your URL length to a full 2048 characters, there is a chance that some component in your network path will cause you problems.) The solution for the URL limit is to move from GET URL submissions to using POST. POST queries pack the parameters into the payload body of the query and thus can have a practically unlimited length. The difficulty is that POST queries are generally more difficult to configure, execute, and debug. While you can run a GET query in any web browser, POST queries require custom code. Since shorter and simpler weather queries can be run as GET queries, and this is another benefit for segmenting large weather queries.

How do I segment my weather query?

Luckily, it is usually very simple to segment a weather query into multiple, smaller queries. While the exact implementation details depend upon the technology stack being used, the tasks are the generally the same.

First, you will want to consider segmenting your queries by location. If your query has a large number of locations, breaking by location groups is a obvious answer. Simply take your location list, and split it into groups from 1 to approximately 20 locations and run each of those as a separate query. This can be done easily whether you are running inside a custom application or building queries to run using a scripting tool such as cURL. And since each result is entirely self-contained for one or more locations, there is generally no need to join the results back together. Simply insert the records into a database, spreadsheet, or via code into your custom UI.

You can also consider segmenting large queries based on date range where limiting locations alone is not sufficient. Depending upon your use case, this may be as simple as changing the start and end dates in your query URL to be non-overlapping segments. If you are using JSON results in code, you are likely already parsing the results into your internal structures. So joining the results will be little additional work.

However, if you are using CSV results, you may need to do a bit of extra work to make a single, unified output in some cases. For database ETL, for example, you can simply loop over the results and insert each segment into the destination table. However, if you have an application that is expecting a single, combined CSV result, then you will need to concatenate the result segments together. In order to do this, you will need to remove the header from from each segment after the first. In code, this is easy; simply ignore the first line of the subsequent segments.

If you are using scripting alone, however, you will need a way to remove the header line from the output files for each file after the first. If you are using the Timeline Weather API, there is a convenient parameter that makes this easy. Simply add the parameter “&options=noheaders” to your query URL, and the resultset will omit the header row.

Below is an example of a Timeline query that returns CSV with no headers.

https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/Tel%20Aviv%2C%20Israel?
key=<YOUR_API_KEY>&include=days&contentType=csv&options=noheaders

If you are using the history or forecast API endpoints, you will need to run a command to strip off the first line. On Linux there are several options including “sed”, “awk”, and “tail”. The “tail” version is the easiest to understand. Simply use a command like this.

tail -n +2 weather2.csv >> weather_all.csv

This will remove the header line from the file “weather2.csv” and append the remainder of the contents to a combined file “weather_all.csv”.

If you are on Windows you can do approximately the same job with tools such as “findstr” and “more”. In this case, the “more” version is the easiest. You command would like this.

more +1 weather2.csv >> weather_all.csv

Summary

Now that you have seen the benefits of segmenting your weather queries as well as how to do so, you are ready to optimize your weather API usage with simplified queries. Remember also that even if you are on a plan that charges per result, the number of results will be the same for one large query or many smaller ones. So there is no financial reason not to add the benefits of query segment to your weather application.

If you have any additional questions about how to segment your queries or other weather query best practices, please reach out to our weather experts. We’ll be glad to help you optimize your queries and efficiently get the weather data that you need.