Skip to content

Resources and Annotated Bibliography

Bennett, Kristin P edited this page Sep 7, 2024 · 2 revisions
Clone this wiki locally

CTBench Benchmark

  1. Neehal, N., Wang, B., Debopadhaya, S., Dan, S., Murugesan, K., Anand, V. and Bennett, K.P., 2024. CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design. arXiv preprint arXiv:2406.17888. Description of CTBench Paper Link

  2. Github with CTBENCH benchmark. This is the Github given to people trying to use the benchmark. https://github.com/nafis-neehal/CTBench_LLM

Other papers

  1. Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675. 2019 Apr 21.Bert scores used for evaluation Paper Link.