LLM Ranker

Overview

This template provides you with a worklow to rank the quality of a large language model (LLM) responses.

Using this template will give you the ability to compare the quality of the responses from different LLMs,and rank the dynamic set of items with a handy drag-and-drop interface.

This enables the following use cases:

Categorize the LLM responses by different types: relevant, irrelevant, biased, offensive, etc.
Compare and rank the quality of the responses from different models.
Rank contextual items for retrieval-augmented generation based chat bots and in-context learning.
Build the preference model for RLHF
Evaluate results of semantic search
LLM routing

Looking for a model to get started with the fine-tuning process? Check out our guide on the Label Studio Blog.

How to create the dataset

Collect a prompt and a list of items you want to display in each task in the following JSON format:

{
  "prompt": "What caused the ancient library of Alexandria to be destroyed?",
  "items": [
    { "id": "llm_1", "title": "LLM 1", "body": "Wars led to library's ruin." },
    { "id": "llm_2", "title": "LLM 2", "body": "Library's end through various wars." },
    { "id": "llm_3", "title": "LLM 3", "body": "Ruin resulted from library wars." }
  ]
}

Collect dataset examples and store them in dataset.json file.

How to configure the labeling interface

The LLM Ranker template includes the following labeling interface in XML format:

<View>
   <View style="display: flex; align-items: center; font-size: 1em;">
      <View style="margin: 0.5em 0.5em 0 0;">
        <Header value="Task: " style="font-size: 1em;"/>
      </View>
      <Text name="task" value="Drag and rank the given AI model responses based on their relevance to the prompt and the level of perceived bias."/>
    </View>
   <View style="display: flex; align-items: center; box-shadow: 2px 2px 5px #999; padding: 10px; border-radius: 5px; background-color: #E0E0E0; font-size: 1.25em;">
      <View style="margin: 0 1em 0 0">
        <Header value="Prompt: "  />
      </View>
      <Text name="prompt" value="$prompt"/>
    </View>
    <View>
      <List name="answers" value="$items" title="All Results" />
      <Style>
        .htx-ranker-column {
          background: #f8f8f8;
          width: 50%;
          padding: 20px;
          border-radius: 3px;
          box-shadow: 0px 2px 5px 0px rgba(0,0,0,0.1);
        }
    
        .htx-ranker-item {
          background: #e0e0e0;
          color: #333;
          font-size: 16px;
          width: 100%;
          padding: 10px;
          margin-bottom: 10px;
          border-radius: 3px;
          box-shadow: 0px 2px 5px 0px rgba(0,0,0,0.1);
        }
        .htx-ranker-item p:last-child { display: none }
      </Style>
      <Ranker name="rank" toName="answers">
    <Bucket name="relevant_results" title="Relevant Results" />
    <Bucket name="biased_results" title="Biased Results" />
  </Ranker> 
    </View>
  </View>

The configuration includes the following elements:

<Text> - the tag that instructs to display the prompt. The value attribute should be set to the name of the prompt element, i.e. prompt in this case.
<List> - the tag that instructs to display the list of items. The value attribute should be set to the name of the list element (in this case items).
<Ranker> - the tag that instructs to ranker the items in the list. The toName attribute should be set to the name of the list element.
<Bucket> - the tag that instructs to create a bucket for the ranked items. Each bucket represents the high-level category of items to be ranked inside this category. The name attribute should be set to the name of the bucket.

Items can be styled in Style tag by using .htx-ranker-item class.

Starting your labeling project

Need a hand getting started with Label Studio? Check out our Zero to One Tutorial.

Create new project in Label Studio
Go to Settings > Labeling Interface > Browse Templates > Generative AI > LLM Ranker
Save the project

Alternatively, you can create project by using our Python SDK:

import label_studio_sdk

ls = label_studio_sdk.Client('YOUR_LABEL_STUDIO_URL', 'YOUR_API_KEY')
project = ls.create_project(title='LLM Ranker', label_config='<View>...</View>')

Import the dataset

To import your dataset, in the project settings go to Import and upload the dataset file dataset.json.

Using the Python SDK, import the dataset with input prompts into Label Studio using the PROJECT_ID of the project you’ve just created.

Run the following code:

from label_studio_sdk import Client

ls = Client(url='<YOUR-LABEL-STUDIO-URL>', api_key='<YOUR-API_KEY>')

project = ls.get_project(id=PROJECT_ID)
project.import_tasks('dataset.json')

If you want to create prelabeled data (for example, ranked order of the items produced by LLM), you can import the dataset with pre-annotations:

project.import_tasks([{
    "data": {"prompt": "...", "items": [...]},
    "predictions": [{
          "type": "ranker",
          "value": {
            "ranker": {
              "_": [
                "llm_2",
                "llm_1"
              ],
              "biased_results": ["llm_3"],
              "relevant_results": []
            }
          },
          "to_name": "prompt",
          "from_name": "rank"
        }]
}])

Under "value" group, you can specify different bucket names. Note "_" used as a special key that represents the original, non-categorized list.

Export the dataset

Labeling results can be exported in JSON format. To export the dataset, go to Export in the project settings and download the file.

Using python SDK you can export the dataset with annotations from Label Studio

annotations = project.export_tasks(format='JSON')

The output of annotations in "value" is expected to contain the following structure:

"value": {
    "ranker": {
      "_": [
        "llm_2",
        "llm_1"
      ],
      "biased_results": ["llm_3"],
      "relevant_results": []
    }
}

where:

"_" is a special key that represents the original, non-categorized list (same as in the import preannotations example above).
"biased_results" and "relevant_results" are the names of the buckets defined in the labeling interface.