Resources and Annotated Bibliography

Jump to bottom

Bennett, Kristin P edited this page Sep 7, 2024 · 2 revisions

CTBench Benchmark

Neehal, N., Wang, B., Debopadhaya, S., Dan, S., Murugesan, K., Anand, V. and Bennett, K.P., 2024. CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design. arXiv preprint arXiv:2406.17888. Description of CTBench Paper Link
Github with CTBENCH benchmark. This is the Github given to people trying to use the benchmark. https://github.com/nafis-neehal/CTBench_LLM

Other papers

Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675. 2019 Apr 21.Bert scores used for evaluation Paper Link.