Source code for "Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence"
This repository contains the source code for the paper. See Citation and BibTeX reference at the bottom of this README.
git clone https://github.com/l-bg/llm_training_brain_asym
cd llm_training_brain_asym
See requirements.txt for the full list of packages used in this work. This file provides the exact version that was used, but the code is expected to work with other versions as well.
It is recommended to create a virtual environment to install the python modules, for example:
With Anaconda
conda create --name llm_brain python=3.10
conda activate llm_brain
pip install -r requirements.txt
Or with Pyenv
pyenv virtualenv 3.10.0 llm_brain
pyenv activate llm_brain
pip install -r requirements.txt
Or with uv
uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt
In order to reproduce the main results of the paper (in English), you need to copy the following items in the project directory:
lpp_en_text.zip, which contains the full text of Le Petit Prince, in English, which is necessary in order to get the activations from LLMs,- the average subject (the whole folder named
lpp_en_average_subject), - the brain mask
mask_lpp_en.nii.gz, - an estimate of the inter-subject correlation (isc),
isc_10trials_en.gz, that is used as an evaluation of the reliability of each voxel, lppEN_word_information.csv, fromannotation/EN/at Le Petit Prince OpenNeuro repository, which contains the acoustic onset of each word in the audiobook.
You can directly download the first four items from https://github.com/l-bg/llms_brain_lateralization:
git clone https://github.com/l-bg/llms_brain_lateralization/
cd llms_brain_lateralization/
cp -r lpp_en_average_subject/ mask_lpp_en.nii.gz isc_10trials_en.gz lpp_en_text.zip ..
cd ..
The procedure is the same for French (use all the files with fr instead of en).
Note: If you want to recompute everything from individual data (or work with individual data), download them from the Le Petit Prince OpenNeuro repository, and use the processing pipeline at https://github.com/l-bg/llms_brain_lateralization.
Set the home folder using the LTBA_DIR shell variable:
export LTBA_DIR=$PWD
Replace $PWD by the path of the project directory if it is not the current working directory.
Alternatively, you can also set the home_folder variable in llm_brain_asym.py to point to this directory.
Note: each following block can be run independently.
-
Extract activations from the LLM and fit the average subject. The following code is for English. For French, simply replace
enwithfrafter the--langoption. These two python files are adapted from the code available at https://github.com/l-bg/llms_brain_lateralization.step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do python extract_llm_activations.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --lang en python fit_average_subject.py --model allenai_OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B --lang en done
Or, if you computer has enough memory to run several of these jobs in parallel (using GNU parallel, the number of concurrent jobs is controlled by the argument to -j):
for step in 150 600 1000 3000 7000 19000 51000 133000 352000 928646; do echo $step ; done > steps.txt for token in 1 3 5 13 30 80 214 558 1477 3896; do echo $token ; done > tokens.txt parallel -j 2 --link python extract_llm_activations.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --lang en :::: steps.txt :::: tokens.txt parallel -j 2 --link python fit_average_subject.py --model allenai_OLMo-2-1124-7B_stage1-step{1}-tokens{2}B --lang en :::: steps.txt :::: tokens.txt
-
Evaluate performance on the minimal pairs benchmark: BLiMP, Zorro, Arithmetic, and Dyck. First download the data for BLiMP and Zorro in the project's directory:
git clone https://github.com/alexwarstadt/blimp git clone https://github.com/phueb/ZorroThen run (note that you need to adjust the batch size to your hardware):
step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do python evaluate_llm_blimp.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 64 --output_folder 'blimp_results' --device cuda python evaluate_llm_zorro.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 64 --output_folder 'zorro_results' --device cuda python evaluate_llm_arithmetic.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 128 --output_folder 'arithmetic_results' --seed 12345 --device cuda python evaluate_llm_dyck.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 32 --output_folder 'dyck_results' --seed 12345 --device cuda done
Again, this can be parallelized (adjusting the argument to -j to your hardware):
for step in 150 600 1000 3000 7000 19000 51000 133000 352000 928646; do echo $step ; done > steps.txt for token in 1 3 5 13 30 80 214 558 1477 3896; do echo $token ; done > tokens.txt parallel -j 2 --link python evaluate_llm_blimp.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 64 --output_folder 'blimp_results' --device cuda :::: steps.txt :::: tokens.txt parallel -j 2 --link python evaluate_llm_zorro.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 64 --output_folder 'zorro_results' --device cuda :::: steps.txt :::: tokens.txt parallel -j 2 --link python evaluate_llm_arithmetic.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 128 --output_folder 'arithmetic_results' --seed 12345 --device cuda :::: steps.txt :::: tokens.txt parallel -j 2 --link python evaluate_llm_dyck.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 32 --output_folder 'dyck_results' --seed 12345 --device cuda :::: steps.txt :::: tokens.txt
-
For the linguistic acceptability of generated texts. First generate the texts using the
generate_texts.pyPython code for each checkpoint (the generation is performed on the CPU here so as to ensure reproducibility), then evaluate all these texts using theevaluate_linguistic_acceptability.pyPython code.step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do python generate_texts.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --output_folder generated_texts --seed 12345 --device cpu done python evaluate_linguistic_acceptability.py
-
Evaluate performance on Hellaswag
lm_evalfrom EleutherAI.step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do lm_eval --model hf \ --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \ --tasks hellaswag \ --num_fewshot 5 \ --device cuda:0 \ --batch_size auto:4 \ --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_hellaswag.json done
-
Same for ARC:
step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do lm_eval --model hf \ --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \ --tasks ai2_arc \ --num_fewshot 5 \ --device cuda:0 \ --batch_size auto:8 \ --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_ai2_arc.json done
-
fr-grammar:
step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do lm_eval --model hf \ --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \ --tasks french_bench_grammar \ --num_fewshot 5 \ --device cuda:0 \ --batch_size auto:8 \ --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_french_bench_grammar.json done
-
and French Hellaswag:
step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646") tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896") for i in "${!step[@]}"; do lm_eval --model hf \ --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \ --tasks french_bench_hellaswag \ --num_fewshot 5 \ --device cuda:0 \ --batch_size 4 \ --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_french_bench_hellaswag.json done
In order to reproduce all the figures in the paper, run analyze_results_olmo2.ipynb in jupyter.
Replicate the results on EleutherAI/pythia-2.8b and EleutherAI/pythia-6.9b by following the same procedure. For instance, in order to extract the activations from the pythia-2.8b LLM and fit the average subject:
for step in 16 32 128 512 1000 3000 7000 19000 52000 143000
do
python extract_llm_activations.py --model EleutherAI/pythia-2.8b --revision step${step} --lang en
python fit_average_subject.py --model EleutherAI_pythia-2.8b_step${step} --lang en
doneThe final state of the project, after downloading all the data and running all the analyses, is organized as follows:
├── analyze_results_olmo2.ipynb
├── analyze_results_pythia.ipynb
├── evaluate_linguistic_acceptability.py
├── evaluate_llm_arithmetic.py
├── evaluate_llm_blimp.py
├── evaluate_llm_dyck.py
├── evaluate_llm_zorro.py
├── extract_llm_activations.py
├── fit_average_subject.py
├── generate_texts.py
├── isc_10trials_en.gz
├── isc_10trials_fr.gz
├── LICENSE
├── llm_brain_asym.py
├── lpp_en_text.zip
├── lppEN_word_information.csv
├── lpp_fr_text.zip
├── lppFR_word_information.csv
├── mask_lpp_en.nii.gz
├── mask_lpp_fr.nii.gz
├── phase_transition.png
├── README.md
├── requirements.txt
├── arithmetic_results
│ ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_arithmetic.csv
│ ├── ...
├── blimp
├── blimp_results
│ ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_blimp.csv
│ └── ...
├── dyck_results
│ ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_dyck.csv
│ └── ...
├── fig
│ ├── allenai_OLMo-2-1124-7B_acc.pdf
│ └── ...
├── generated_texts
│ ├── allenai_OLMo-2-1124-7B_gen_cola.csv
│ ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_gen_0_0.txt
│ └── ...
├── llms_activations
│ ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_en.lz4
│ ├── ...
│ └── onsets_offsets_en.lz4
├── llms_brain_correlations
│ ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_layer-0_corr_en.gz
│ └── ...
├── lpp_en_average_subject
│ ├── average_subject_run-0.gz
│ └── ...
├── lpp_fr_average_subject
│ ├── average_subject_run-0.gz
│ └── ...
├── Zorro
└── zorro_results
├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_zorro.csv
└── ...
Bonnasse-Gahot, L., & Pallier, C. (2024). Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence. arXiv preprint arXiv:2602.12811.
@article{lbg2026left,
title={Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence},
author={Bonnasse-Gahot, Laurent and Pallier, Christophe},
journal={arXiv preprint arXiv:2602.12811},
year={2026}
}