Choosing a workload
The opensearch-benchmark-workloads repository contains a list of workloads that you can use to run your benchmarks. Using a workload similar to your cluster’s use cases can save you time and effort when assessing your cluster’s performance.
For example, say you’re a system architect at a rideshare company. As a rideshare company, you collect and store data based on trip times, locations, and other data related to each rideshare. Instead of building a custom workload and using your own data, which requires additional time, effort, and cost, you can use the nyc_taxis workload to benchmark your cluster because the data inside the workload is similar to the data that you collect.
Criteria for choosing a workload
Consider the following criteria when deciding which workload would work best for benchmarking your cluster:
- The cluster’s use case.
- The data types that your cluster uses compared to the data structure of the documents contained in the workload. Each workload contains an example document so that you can compare data types, or you can view the index mappings and data types in the
index.json
file. - The query types most commonly used inside your cluster. The
operations/default.json
file contains information about the query types and workload operations.
General search clusters
For benchmarking clusters built for general search use cases, start with the [nyc_taxis]
(https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis) workload. This workload contains data about the rides taken in yellow taxis in New York City in 2015.
Log data
For benchmarking clusters built for indexing and search with log data, use the http_logs
workload. This workload contains data about the 1998 World Cup.