Skip to content

l-bg/llm_training_brain_asym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source code for "Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence"

Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence

This repository contains the source code for the paper. See Citation and BibTeX reference at the bottom of this README.

⚠️ If you use code from this repository, please cite this repository using the reference provided by clicking the link with the label "Cite this repository" in the right sidebar.

Set up

git clone https://github.com/l-bg/llm_training_brain_asym
cd  llm_training_brain_asym

Install Python modules

See requirements.txt for the full list of packages used in this work. This file provides the exact version that was used, but the code is expected to work with other versions as well.

It is recommended to create a virtual environment to install the python modules, for example:

With Anaconda

conda create --name llm_brain python=3.10
conda activate llm_brain
pip install -r requirements.txt

Or with Pyenv

pyenv virtualenv 3.10.0 llm_brain
pyenv activate llm_brain
pip install -r requirements.txt

Or with uv

uv venv --python 3.10
source .venv/bin/activate
uv pip install -r requirements.txt

Getting fMRI data (average subjects), masks, full text and annotations

In order to reproduce the main results of the paper (in English), you need to copy the following items in the project directory:

  • lpp_en_text.zip, which contains the full text of Le Petit Prince, in English, which is necessary in order to get the activations from LLMs,
  • the average subject (the whole folder named lpp_en_average_subject),
  • the brain mask mask_lpp_en.nii.gz,
  • an estimate of the inter-subject correlation (isc), isc_10trials_en.gz, that is used as an evaluation of the reliability of each voxel,
  • lppEN_word_information.csv, from annotation/EN/ at Le Petit Prince OpenNeuro repository, which contains the acoustic onset of each word in the audiobook.

You can directly download the first four items from https://github.com/l-bg/llms_brain_lateralization:

git clone https://github.com/l-bg/llms_brain_lateralization/
cd llms_brain_lateralization/
cp -r lpp_en_average_subject/ mask_lpp_en.nii.gz isc_10trials_en.gz lpp_en_text.zip  ..
cd ..

The procedure is the same for French (use all the files with fr instead of en).

Note: If you want to recompute everything from individual data (or work with individual data), download them from the Le Petit Prince OpenNeuro repository, and use the processing pipeline at https://github.com/l-bg/llms_brain_lateralization.

Main processing pipeline: OLMo-2-1124-7B model

Set the home folder using the LTBA_DIR shell variable:

export LTBA_DIR=$PWD

Replace $PWD by the path of the project directory if it is not the current working directory. Alternatively, you can also set the home_folder variable in llm_brain_asym.py to point to this directory.

Note: each following block can be run independently.

  • Extract activations from the LLM and fit the average subject. The following code is for English. For French, simply replace en with fr after the --lang option. These two python files are adapted from the code available at https://github.com/l-bg/llms_brain_lateralization.

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        python extract_llm_activations.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --lang en
        python fit_average_subject.py --model allenai_OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B --lang en
    done

    Or, if you computer has enough memory to run several of these jobs in parallel (using GNU parallel, the number of concurrent jobs is controlled by the argument to -j):

    for step in 150 600 1000 3000 7000 19000 51000 133000 352000 928646; do echo $step  ; done > steps.txt
    for token in  1   3    5   13   30    80   214    558   1477   3896; do echo $token ; done > tokens.txt
    
    parallel -j 2 --link python extract_llm_activations.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --lang en :::: steps.txt :::: tokens.txt
    parallel -j 2 --link python fit_average_subject.py --model allenai_OLMo-2-1124-7B_stage1-step{1}-tokens{2}B --lang en :::: steps.txt :::: tokens.txt
  • Evaluate performance on the minimal pairs benchmark: BLiMP, Zorro, Arithmetic, and Dyck. First download the data for BLiMP and Zorro in the project's directory:

    git clone https://github.com/alexwarstadt/blimp
    git clone https://github.com/phueb/Zorro
    

    Then run (note that you need to adjust the batch size to your hardware):

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        python evaluate_llm_blimp.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 64 --output_folder 'blimp_results' --device cuda
        python evaluate_llm_zorro.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 64 --output_folder 'zorro_results' --device cuda
        python evaluate_llm_arithmetic.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 128 --output_folder 'arithmetic_results' --seed 12345 --device cuda
        python evaluate_llm_dyck.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --batch_size 32 --output_folder 'dyck_results' --seed 12345 --device cuda
    done

    Again, this can be parallelized (adjusting the argument to -j to your hardware):

    for step in 150 600 1000 3000 7000 19000 51000 133000 352000 928646; do echo $step  ; done > steps.txt
    for token in  1   3    5   13   30    80   214    558   1477   3896; do echo $token ; done > tokens.txt
    
    parallel -j 2 --link python  evaluate_llm_blimp.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 64 --output_folder 'blimp_results' --device cuda :::: steps.txt :::: tokens.txt
    parallel -j 2 --link python evaluate_llm_zorro.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 64 --output_folder 'zorro_results' --device cuda :::: steps.txt :::: tokens.txt
    parallel -j 2 --link  python evaluate_llm_arithmetic.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 128 --output_folder 'arithmetic_results' --seed 12345 --device cuda :::: steps.txt :::: tokens.txt
    parallel -j 2 --link python evaluate_llm_dyck.py --model allenai/OLMo-2-1124-7B --revision stage1-step{1}-tokens{2}B --batch_size 32 --output_folder 'dyck_results' --seed 12345 --device cuda :::: steps.txt :::: tokens.txt
  • For the linguistic acceptability of generated texts. First generate the texts using the generate_texts.py Python code for each checkpoint (the generation is performed on the CPU here so as to ensure reproducibility), then evaluate all these texts using the evaluate_linguistic_acceptability.py Python code.

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        python generate_texts.py --model allenai/OLMo-2-1124-7B --revision stage1-step${step[$i]}-tokens${tokens[$i]}B --output_folder generated_texts --seed 12345 --device cpu
    done
    
    python evaluate_linguistic_acceptability.py
  • Evaluate performance on Hellaswaglm_eval from EleutherAI.

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        lm_eval --model hf \
        --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
        --tasks hellaswag \
        --num_fewshot 5 \
        --device cuda:0 \
        --batch_size auto:4 \
        --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_hellaswag.json
    done
  • Same for ARC:

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        lm_eval --model hf \
        --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
        --tasks ai2_arc \
        --num_fewshot 5 \
        --device cuda:0 \
        --batch_size auto:8 \
        --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_ai2_arc.json
    done
  • fr-grammar:

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        lm_eval --model hf \
        --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
        --tasks french_bench_grammar \
        --num_fewshot 5 \
        --device cuda:0 \
        --batch_size auto:8 \
        --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_french_bench_grammar.json
    done
  • and French Hellaswag:

    step=("150" "600" "1000" "3000" "7000" "19000" "51000" "133000" "352000" "928646")
    tokens=("1" "3" "5" "13" "30" "80" "214" "558" "1477" "3896")
    for i in "${!step[@]}"; do
        lm_eval --model hf \
        --model_args pretrained=allenai/OLMo-2-1124-7B,revision=stage1-step${step[$i]}-tokens${tokens[$i]}B,dtype="float16" \
        --tasks french_bench_hellaswag \
        --num_fewshot 5 \
        --device cuda:0 \
        --batch_size 4 \
        --output_path lm_eval_results/OLMo-2-1124-7B_stage1-step${step[$i]}-tokens${tokens[$i]}B_french_bench_hellaswag.json
    done

Analyze and visualize the results

In order to reproduce all the figures in the paper, run analyze_results_olmo2.ipynb in jupyter.

Reproduce the results using models from the Pythia family.

Replicate the results on EleutherAI/pythia-2.8b and EleutherAI/pythia-6.9b by following the same procedure. For instance, in order to extract the activations from the pythia-2.8b LLM and fit the average subject:

for step in 16 32 128 512 1000 3000 7000 19000 52000 143000
do
    python extract_llm_activations.py --model EleutherAI/pythia-2.8b --revision step${step} --lang en
    python fit_average_subject.py --model EleutherAI_pythia-2.8b_step${step} --lang en
done

Tree structure of the project

The final state of the project, after downloading all the data and running all the analyses, is organized as follows:

├── analyze_results_olmo2.ipynb
├── analyze_results_pythia.ipynb
├── evaluate_linguistic_acceptability.py
├── evaluate_llm_arithmetic.py
├── evaluate_llm_blimp.py
├── evaluate_llm_dyck.py
├── evaluate_llm_zorro.py
├── extract_llm_activations.py
├── fit_average_subject.py
├── generate_texts.py
├── isc_10trials_en.gz
├── isc_10trials_fr.gz
├── LICENSE
├── llm_brain_asym.py
├── lpp_en_text.zip
├── lppEN_word_information.csv
├── lpp_fr_text.zip
├── lppFR_word_information.csv
├── mask_lpp_en.nii.gz
├── mask_lpp_fr.nii.gz
├── phase_transition.png
├── README.md
├── requirements.txt
├── arithmetic_results
│   ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_arithmetic.csv
│   ├── ...
├── blimp
├── blimp_results
│   ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_blimp.csv
│   └── ...
├── dyck_results
│   ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_dyck.csv
│   └── ...
├── fig
│   ├── allenai_OLMo-2-1124-7B_acc.pdf
│   └── ...
├── generated_texts
│   ├── allenai_OLMo-2-1124-7B_gen_cola.csv
│   ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_gen_0_0.txt
│   └── ...
├── llms_activations
│   ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_en.lz4
│   ├── ...
│   └── onsets_offsets_en.lz4
├── llms_brain_correlations
│   ├── allenai_OLMo-2-1124-7B_stage1-step150-tokens1B_layer-0_corr_en.gz
│   └── ...
├── lpp_en_average_subject
│   ├── average_subject_run-0.gz
│   └── ...
├── lpp_fr_average_subject
│   ├── average_subject_run-0.gz
│   └── ...
├── Zorro
└── zorro_results
    ├── allenai_OLMo-2-1124-7B_stage1-step1000-tokens5B_zorro.csv
    └── ...

Citation and BibTeX reference

Bonnasse-Gahot, L., & Pallier, C. (2024). Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence. arXiv preprint arXiv:2602.12811.

@article{lbg2026left,
  title={Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence},
  author={Bonnasse-Gahot, Laurent and Pallier, Christophe},
  journal={arXiv preprint arXiv:2602.12811},
  year={2026}
}

About

Source code for "Left-right asymmetry in predicting brain activity from LLMs' representations emerges with their formal linguistic competence"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors