Link Search Menu Expand Document Documentation Menu

Collapse processor

The collapse response processor discards hits that have the same value for a particular field as a previous document in the result set. This is similar to passing the collapse parameter in a search request, but the response processor is applied to the response after fetching from all shards. The collapse response processor may be used in conjunction with the rescore search request parameter or may be applied after a reranking response processor.

Using the collapse response processor will likely result in fewer than size results being returned because hits are discarded from a set whose size is already less than or equal to size. To increase the likelihood of returning size hits, use the oversample request processor and truncate_hits response processor, as shown in this example.

Request fields

The following table lists all request fields.

Field Data type Description
field String The field whose value will be read from each returned search hit. Only the first hit for each given field value will be returned in the search response. Required.
context_prefix String May be used to read the original_size variable from a specific scope in order to avoid collisions. Optional.
tag String The processor’s identifier. Optional.
description String A description of the processor. Optional.
ignore_failure Boolean If true, OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false.

Example

The following example demonstrates using a search pipeline with a collapse processor.

Setup

Create many documents containing a field to use for collapsing:

POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }

Create a pipeline that only collapses on the color field:

PUT /_search/pipeline/collapse_pipeline
{
  "response_processors": [
    {
      "collapse" : {
        "field": "color"
      }
    }
  ]
}

Using a search pipeline

In this example, you request the top three documents before collapsing on the color field. Because the first two documents have the same color, the second one is discarded, and the request returns the first and third documents:

POST /my_index/_search?search_pipeline=collapse_pipeline
{
  "size": 3
}

Response
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_index",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 1",
          "color" : "blue"
        }
      },
      {
        "_index" : "my_index",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "document 3",
          "color" : "red"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [ ]
  }
}
350 characters left

Have a question? .

Want to contribute? or .