1st Multilingual Model Workshop - Guardrails and Evaluation of the Jais Arabic-English LLMs

2.3K subscribers

72 views

About
Share

Published On Feb 12, 2024

Preslav discusses the guardrails developed for the Jais bilingual Arabic-English language models. This was achieved using (i) meticulous data cleansing, (ii) instruction-tuning for safety, (iii) safety prompt engineering, and (iv) developing keyword lists and classifiers to run at inference time. For (ii), the team developed a risk taxonomy and examples that cover 5 risk areas, 12 harm types, and 61 specific harm categories.

Preslav further discusses evaluation. In addition to perplexity, the team used downstream evaluation, for Arabic and English, covering world knowledge, commonsense reasoning, and misinformation & bias. For evaluating the model in Arabic, the team used pre-existing datasets such as EXAMS, which contained Arabic matriculation questions, the team curated their own dataset covering Arabic literature, and translated English evaluation datasets to Arabic (manually, for MMLU; and automatically, using an in-house system, for the other English datasets). The team further performed generation evaluation using GPT-4 as a judge. Yet, whenever feasible, the team performed human evaluation, which was the most important input that informed many important decisions about building the model.

Published On Feb 12, 2024

Share/Embed

Video Link