Skip to content
Change the repository type filter

All

    Repositories list

    • Repo for "Language Models Largely Exhibit Human-like Constituent Ordering Preferences"
      Python
      1100Updated Apr 26, 2025Apr 26, 2025
    • Python
      2100Updated Apr 25, 2025Apr 25, 2025
    • Python
      21070Updated Apr 24, 2025Apr 24, 2025
    • safearena

      Public
      SafeArena is a benchmark for assessing the harmful capabilities of web agents
      Python
      21500Updated Apr 23, 2025Apr 23, 2025
    • Python
      0220Updated Apr 16, 2025Apr 16, 2025
    • AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
      Python
      01110Updated Apr 15, 2025Apr 15, 2025
    • MIT License
      2800Updated Apr 11, 2025Apr 11, 2025
    • Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
      Jupyter Notebook
      MIT License
      3442411Updated Apr 8, 2025Apr 8, 2025
    • AfroBench

      Public
      Large Scale Benchmark of Large Language Models on African Languages
      Python
      0000Updated Apr 7, 2025Apr 7, 2025
    • Code for `Exploiting Instruction-Following Retrievers for Malicious Information Retrieval`
      Python
      MIT License
      1600Updated Apr 1, 2025Apr 1, 2025
    • project-page-template

      Public template
      Template for creating project webpages based on jekyll/minimal-mistakes
      1100Updated Mar 13, 2025Mar 13, 2025
    • Python
      34000Updated Mar 11, 2025Mar 11, 2025
    • CHASE

      Public
      Synthetic Data Generation for Evaluation
      Python
      MIT License
      41200Updated Feb 21, 2025Feb 21, 2025
    • Injongo

      Public
      A multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains.
      Jupyter Notebook
      GNU General Public License v3.0
      0000Updated Feb 12, 2025Feb 12, 2025
    • weblinx

      Public
      WebLINX is a benchmark for building web navigation agents with conversational capabilities
      Python
      Apache License 2.0
      1614600Updated Feb 11, 2025Feb 11, 2025
    • Evaluation dataset for our NAACL 2025 paper on "Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs"
      Apache License 2.0
      0000Updated Feb 4, 2025Feb 4, 2025
    • llm2vec

      Public
      Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
      Python
      MIT License
      1201.5k314Updated Jan 24, 2025Jan 24, 2025
    • AURORA

      Public
      Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
      Python
      MIT License
      22800Updated Jan 14, 2025Jan 14, 2025
    • ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.
      Python
      4313700Updated Dec 16, 2024Dec 16, 2024
    • webllama

      Public
      Llama-3 agents that can browse the web by following instructions and talking to you
      Python
      MIT License
      1081.4k20Updated Dec 10, 2024Dec 10, 2024
    • VinePPO

      Public
      Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
      Python
      MIT License
      1515122Updated Nov 11, 2024Nov 11, 2024
    • The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents
      Python
      2900Updated Nov 6, 2024Nov 6, 2024
    • NAACL 2024: Evaluating In-Context Learning of Libraries for Code Generation
      Python
      1600Updated Oct 23, 2024Oct 23, 2024
    • Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
      Python
      Apache License 2.0
      58351Updated Aug 12, 2024Aug 12, 2024
    • Code and data for the paper 'Scope Ambiguities in Large Language Models'.
      Python
      MIT License
      2500Updated Jun 25, 2024Jun 25, 2024
    • Code for "Universal Adversarial Triggers Are Not Universal."
      Python
      MIT License
      21700Updated May 2, 2024May 2, 2024
    • Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
      Python
      MIT License
      713530Updated Apr 30, 2024Apr 30, 2024
    • HTML
      0000Updated Apr 9, 2024Apr 9, 2024
    • MAGNIFICo

      Public
      EMNLP 2023: MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
      Python
      0100Updated Mar 17, 2024Mar 17, 2024
    • Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
      Python
      13200Updated Mar 15, 2024Mar 15, 2024