Resources and Annotated Bibliography
Bennett, Kristin P edited this page Sep 7, 2024
·
2 revisions
Clone this wiki locally
CTBench Benchmark
-
Neehal, N., Wang, B., Debopadhaya, S., Dan, S., Murugesan, K., Anand, V. and Bennett, K.P., 2024. CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design. arXiv preprint arXiv:2406.17888. Description of CTBench Paper Link
-
Github with CTBENCH benchmark. This is the Github given to people trying to use the benchmark. https://github.com/nafis-neehal/CTBench_LLM
Other papers
- Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675. 2019 Apr 21.Bert scores used for evaluation Paper Link.