Source code for "Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence"

This repository contains the source code for the paper. See Citation and BibTeX reference at the bottom of this README.

⚠️ If you use code from this repository, please cite this repository using the reference provided by clicking the link with the label "Cite this repository" in the right sidebar.

Set up

git clone https://github.com/l-bg/llm_training_brain_asym
cd  llm_training_brain_asym

Install Python modules

See requirements.txt for the full list of packages used in this work. This file provides the exact version that was used, but the code is expected to work with other versions as well.

It is recommended to create a virtual environment to install the python modules, for example:

With Anaconda

conda create --name llm_brain python=3.10
conda activate llm_brain
pip install -r requirements.txt

Or with Pyenv

pyenv virtualenv 3.10.0 llm_brain
pyenv activate llm_brain
pip install -r requirements.txt

Or with uv

uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt

Getting fMRI data (average subjects), masks, full text and annotations

In order to reproduce the main results of the paper (in English), you need to copy the following items in the project directory:

lpp_en_text.zip, which contains the full text of Le Petit Prince, in English, which is necessary in order to get the activations from LLMs,
the average subject (the whole folder named lpp_en_average_subject),
the brain mask mask_lpp_en.nii.gz,
an estimate of the inter-subject correlation (isc), isc_10trials_en.gz, that is used as an evaluation of the reliability of each voxel,
lppEN_word_information.csv, from annotation/EN/ at Le Petit Prince OpenNeuro repository, which contains the acoustic onset of each word in the audiobook.

You can directly download the first four items from https://github.com/l-bg/llms_brain_lateralization:

git clone https://github.com/l-bg/llms_brain_lateralization/
cd llms_brain_lateralization/
cp -r lpp_en_average_subject/ mask_lpp_en.nii.gz isc_10trials_en.gz lpp_en_text.zip  ..
cd ..

The procedure is the same for French (use all the files with fr instead of en).

Note: If you want to recompute everything from individual data (or work with individual data), download them from the Le Petit Prince OpenNeuro repository, and use the processing pipeline at https://github.com/l-bg/llms_brain_lateralization.

Main processing pipeline: OLMo-2-1124-7B model

Set the home folder using the LTBA_DIR shell variable:

export LTBA_DIR=$PWD

Replace $PWD by the path of the project directory if it is not the current working directory. Alternatively, you can also set the home_folder variable in llm_brain_asym.py to point to this directory.

Note: each following block can be run independently.

Extract activations from the LLM and fit the average subject. The following code is for English. For French, simply replace en with fr after the --lang option. These two python files are adapted from the code available at https://github.com/l-bg/llms_brain_lateralization.

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    python extract_llm_activations.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --lang en
    python fit_average_subject.py --model allenai_OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B --lang en
done

Or, if you computer has enough memory to run several of these jobs in parallel (using GNU parallel, the number of concurrent jobs is controlled by the argument to -j):

for step in 150 600 1000 3000 7000 19000 51000 133000 352000 928646; do echo $step  ; done > steps.txt
for token in  1   3    5   13   30    80   214    558   1477   3896; do echo $token ; done > tokens.txt

parallel -j 2 --link python extract_llm_activations.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --lang en :::: steps.txt :::: tokens.txt
parallel -j 2 --link python fit_average_subject.py --model allenai_OLMo-2-1124-7B_stage1-step{1}-tokens{2}B --lang en :::: steps.txt :::: tokens.txt

Evaluate performance on the minimal pairs benchmark: BLiMP, Zorro, Arithmetic, and Dyck. First download the data for BLiMP and Zorro in the project's directory:

git clone https://github.com/alexwarstadt/blimp
git clone https://github.com/phueb/Zorro

Then run (note that you need to adjust the batch size to your hardware):

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    python evaluate_llm_blimp.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 64 --output_folder 'blimp_results' --device cuda
    python evaluate_llm_zorro.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 64 --output_folder 'zorro_results' --device cuda
    python evaluate_llm_arithmetic.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 128 --output_folder 'arithmetic_results' --seed 12345 --device cuda
    python evaluate_llm_dyck.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 32 --output_folder 'dyck_results' --seed 12345 --device cuda
done

Again, this can be parallelized (adjusting the argument to -j to your hardware):

for step in 150 600 1000 3000 7000 19000 51000 133000 352000 928646; do echo $step  ; done > steps.txt
for token in  1   3    5   13   30    80   214    558   1477   3896; do echo $token ; done > tokens.txt

parallel -j 2 --link python  evaluate_llm_blimp.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 64 --output_folder 'blimp_results' --device cuda :::: steps.txt :::: tokens.txt
parallel -j 2 --link python evaluate_llm_zorro.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 64 --output_folder 'zorro_results' --device cuda :::: steps.txt :::: tokens.txt
parallel -j 2 --link  python evaluate_llm_arithmetic.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 128 --output_folder 'arithmetic_results' --seed 12345 --device cuda :::: steps.txt :::: tokens.txt
parallel -j 2 --link python evaluate_llm_dyck.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 32 --output_folder 'dyck_results' --seed 12345 --device cuda :::: steps.txt :::: tokens.txt

For the linguistic acceptability of generated texts. First generate the texts using the generate_texts.py Python code for each checkpoint (the generation is performed on the CPU here so as to ensure reproducibility), then evaluate all these texts using the evaluate_linguistic_acceptability.py Python code.

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    python generate_texts.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --output_folder generated_texts --seed 12345 --device cpu
done

python evaluate_linguistic_acceptability.py

Evaluate performance on Hellaswaglm_eval from EleutherAI.

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    lm_eval --model hf \
    --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
    --tasks hellaswag \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:4 \
    --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_hellaswag.json
done

Same for ARC:

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    lm_eval --model hf \
    --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
    --tasks ai2_arc \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:8 \
    --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_ai2_arc.json
done

fr-grammar:

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    lm_eval --model hf \
    --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
    --tasks french_bench_grammar \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size auto:8 \
    --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_french_bench_grammar.json
done

and French Hellaswag:

step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
for i in "${!step[@]}"; do
    lm_eval --model hf \
    --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
    --tasks french_bench_hellaswag \
    --num_fewshot 5 \
    --device cuda:0 \
    --batch_size 4 \
    --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_french_bench_hellaswag.json
done

Analyze and visualize the results

In order to reproduce all the figures in the paper, run analyze_results_olmo2.ipynb in jupyter.

Reproduce the results using models from the Pythia family.

Replicate the results on EleutherAI/pythia-2.8b and EleutherAI/pythia-6.9b by following the same procedure. For instance, in order to extract the activations from the pythia-2.8b LLM and fit the average subject:

for step in 16 32 128 512 1000 3000 7000 19000 52000 143000
do
    python extract_llm_activations.py --model EleutherAI/pythia-2.8b --revision step${step} --lang en
    python fit_average_subject.py --model EleutherAI_pythia-2.8b_step${step} --lang en
done

Tree structure of the project

The final state of the project, after downloading all the data and running all the analyses, is organized as follows:

├── analyze_results_olmo2.ipynb
├── analyze_results_pythia.ipynb
├── evaluate_linguistic_acceptability.py
├── evaluate_llm_arithmetic.py
├── evaluate_llm_blimp.py
├── evaluate_llm_dyck.py
├── evaluate_llm_zorro.py
├── extract_llm_activations.py
├── fit_average_subject.py
├── generate_texts.py
├── isc_10trials_en.gz
├── isc_10trials_fr.gz
├── LICENSE
├── llm_brain_asym.py
├── lpp_en_text.zip
├── lppEN_word_information.csv
├── lpp_fr_text.zip
├── lppFR_word_information.csv
├── mask_lpp_en.nii.gz
├── mask_lpp_fr.nii.gz
├── phase_transition.png
├── README.md
├── requirements.txt
├── arithmetic_results
│   ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_arithmetic.csv
│   ├── ...
├── blimp
├── blimp_results
│   ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_blimp.csv
│   └── ...
├── dyck_results
│   ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_dyck.csv
│   └── ...
├── fig
│   ├── allenai_OLMo-2-1124-7B_acc.pdf
│   └── ...
├── generated_texts
│   ├── allenai_OLMo-2-1124-7B_gen_cola.csv
│   ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_gen_0_0.txt
│   └── ...
├── llms_activations
│   ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_en.lz4
│   ├── ...
│   └── onsets_offsets_en.lz4
├── llms_brain_correlations
│   ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_layer-0_corr_en.gz
│   └── ...
├── lpp_en_average_subject
│   ├── average_subject_run-0.gz
│   └── ...
├── lpp_fr_average_subject
│   ├── average_subject_run-0.gz
│   └── ...
├── Zorro
└── zorro_results
    ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_zorro.csv
    └── ...

Citation and BibTeX reference

Bonnasse-Gahot, L., & Pallier, C. (2024). Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence. arXiv preprint arXiv:2602.12811.

@article{lbg2026left,
  title={Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence},
  author={Bonnasse-Gahot, Laurent and Pallier, Christophe},
  journal={arXiv preprint arXiv:2602.12811},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Source code for "Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence"

Set up

Install Python modules

Getting fMRI data (average subjects), masks, full text and annotations

Main processing pipeline: OLMo-2-1124-7B model

Analyze and visualize the results

Reproduce the results using models from the Pythia family.

Tree structure of the project

Citation and BibTeX reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
analyze_results_olmo2.ipynb		analyze_results_olmo2.ipynb
analyze_results_pythia.ipynb		analyze_results_pythia.ipynb
evaluate_linguistic_acceptability.py		evaluate_linguistic_acceptability.py
evaluate_llm_arithmetic.py		evaluate_llm_arithmetic.py
evaluate_llm_blimp.py		evaluate_llm_blimp.py
evaluate_llm_dyck.py		evaluate_llm_dyck.py
evaluate_llm_zorro.py		evaluate_llm_zorro.py
extract_llm_activations.py		extract_llm_activations.py
fit_average_subject.py		fit_average_subject.py
generate_texts.py		generate_texts.py
generated_texts.zip		generated_texts.zip
llm_brain_asym.py		llm_brain_asym.py
phase_transition.png		phase_transition.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Source code for "Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence"

Set up

Install Python modules

Getting fMRI data (average subjects), masks, full text and annotations

Main processing pipeline: OLMo-2-1124-7B model

Analyze and visualize the results

Reproduce the results using models from the Pythia family.

Tree structure of the project

Citation and BibTeX reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages