Experiment Run Page

Introduction

The Experiment Page provides the functionality to set up and run experiments on your dataset. The application supports two experiment modes:

Answer Generation – The system generates answers for the user prompt using the provided dataset without any fact verification.
Fact Extraction & Verification – In addition to generating answers, the system performs a structured verification process, including information extraction, SPARQL queries, and post-processing to validate facts against a knowledge graph (KG). To use this mode, you must have a pre-configured pipeline.

Operational Flow

Choose the setup method:
- Select whether to build the experiment using a Prompt-based setup or a Pipeline file.
- Prompt-based setup only generates answers and does not perform fact verification.
- Pipeline file setup generates answers and extracts facts using predefined functions from the uploaded pipeline file.
Upload your dataset – Provide the dataset file containing the input data for the experiment.
Select LLMs – Choose the Large Language Models (LLMs) to run the experiment against.
Set the number of retries (optional) – If you want to generate multiple versions of an answer for a single query, specify the number of retries.
Submit the experiment – The system follows these steps upon submission:
- Validates all experiment configurations to ensure inputs are correctly provided.
- Builds the execution pipeline based on the selected setup method.
- Executes the pipeline across multiple LLMs over the dataset.
Receive confirmation and view results:
- A confirmation popup will indicate the experiment run status.
- The system will display results in a structured table with an option to download the full experiment report.

Experiment Inputs

Field	Description	Required	Input Type
Setup Method	Select "Answer Generation" for response generation only, or "Fact Extraction" to include fact verification.	Yes	Selection
System Prompt	Defines the system prompt template for Answer Generation mode. Use `{{column_name}}` to dynamically insert dataset values into queries.	Yes	String
User Prompt	Specifies the task or query for generating answers.	Yes	String
Pipeline Configuration File	Upload a pre-configured pipeline file (only required for Fact Extraction mode).	Yes (if using Fact Extraction)	JSON
Dataset File	Upload the dataset containing queries and relevant input information.	Yes	CSV
Number of Retries	Number of times the same query should be tested with an LLM.	No	Integer
LLMs	Select the Large Language Models to run the experiment against.	Yes	Category

Experiment Outputs

The system generates structured experiment results in the following format:

{
    "experiment_info": {
        "datetime": "Timestamp of the experiment run",
        "mode": "Experiment mode; 'Answer Generation' or 'Fact Extraction'",
        "retries": "Number of retries per query",
        "LLMs": "LLMs used in the experiment"
    },
    "pipeline": {
        "dataset_structure": ["List of expected column names"],
        "system_prompt_builder": "Function used to incorporate dataset values into the system prompt",
        "user_prompt": "Defined user prompt",
        "information_extraction": "Function responsible for extracting relevant facts",
        "sparql_endpoint": "SPARQL knowledge graph endpoint",
        "sparql_query_template": "Template for SPARQL queries",
        "query_function": "Function used to execute SPARQL queries",
        "post_processor": "Function handling post-processing of SPARQL responses"
    },
    "dataset": ["Dataset contents"],
    "results": [
        {
            "llm": "LLM used to generate the answer",
            "query_id": "Index of the query in the dataset",
            "try": "Retry attempt number",
            "system_prompt": "Final system prompt used after dataset values were inserted",
            "generated_answer": "Response generated by the LLM",
            "extracted_info": "Comma-separated string of extracted factual information",
            "<post-processed column names>": "Values produced after post-processing (varies based on configuration)"
        }
    ]
}

Home
User guide
- Setup pipeline page
- Experiment run page
Developer guide

Experiment Run Page

Introduction

Operational Flow

Experiment Inputs

Experiment Outputs

Clone this wiki locally