Skip to content

Commit 0f0dd49

Browse files
authored
Merge pull request #2462 from NNPDF/md5_issue
2 parents bb30db4 + a16ec31 commit 0f0dd49

9 files changed

Lines changed: 178 additions & 34 deletions

File tree

.github/workflows/all_tests_nnpdf.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ jobs:
110110
shell: bash -l {0}
111111
run: |
112112
cd n3fit/runcards/examples
113+
vp-setupfit Basic_runcard.yml
113114
n3fit Basic_runcard.yml 4
114115
cat Basic_runcard/nnfit/*/Basic_runcard.json
115116
- name: Test we can still run postfit
@@ -139,6 +140,7 @@ jobs:
139140
shell: bash -l {0}
140141
run: |
141142
cd n3fit/runcards/examples
143+
vp-setupfit Basic_runcard.yml
142144
n3fit Basic_runcard.yml 42
143145
cat Basic_runcard/nnfit/*/Basic_runcard.json
144146

doc/sphinx/source/tutorials/closuretest.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -159,11 +159,11 @@ Running a closure test with ``n3fit``
159159

160160
Running a closure test with ``n3fit`` will require a valid ``n3fit`` runcard, with
161161
the closure test settings modified as shown
162-
:ref:`above <prep_ct_runcard>`. The difference
163-
between running a closure fit in ``n3fit`` and a standard fit is that the user is
164-
required to run ``vp-setupfit`` on the runcard before running ``n3fit``. This is
165-
because the filtering of the data is required to generate the pseudodata central
166-
values. The workflow is as follows:
162+
:ref:`above <prep_ct_runcard>`. As with any standard fit, the user is required
163+
to run ``vp-setupfit`` on the runcard before running ``n3fit`` (see
164+
:ref:`run-n3fit-fit`); for a closure test this step is additionally responsible
165+
for filtering the data and generating the pseudodata central values. The
166+
workflow is as follows:
167167

168168
.. code:: bash
169169

doc/sphinx/source/tutorials/run-fit.rst

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,13 @@ Running the fitting code
7575
After successfully installing the ``n3fit`` package and preparing a runcard
7676
following the points presented above you can proceed with a fit.
7777

78-
1. Prepare the fit using ``vp-setupfit``. This command will generate a
79-
folder with the same name as the runcard (minus the file extension) in the
80-
current directory, which will contain a copy of the original YAML runcard.
81-
The required resources will be downloaded, which includes:
78+
1. Prepare the fit using ``vp-setupfit``. **This step is mandatory**: it
79+
produces resources that ``n3fit`` requires in order to run.
80+
81+
The command creates a folder with the same name as the runcard (minus the
82+
file extension) in the current directory, which will contain a copy of the
83+
original YAML runcard. The required resources will be downloaded, which
84+
includes:
8285

8386
- The t0 PDF set (an LHAPDF object).
8487
- The FastKernel tables in the form of a ``theory_xxx.tgz`` file
@@ -87,6 +90,10 @@ following the points presented above you can proceed with a fit.
8790
If the runcard requires to precompute some heavy objects shared among replicas,
8891
such as the theory covariance matrix, it will be done during this step.
8992

93+
A file named ``md5`` is also written inside the output folder. It contains a
94+
hash of the runcard which ``n3fit`` uses to detect changes made to the
95+
runcard between ``vp-setupfit`` and ``n3fit`` (see step 2).
96+
9097
::
9198

9299
vp-setupfit <runcard>.yml
@@ -97,6 +104,27 @@ following the points presented above you can proceed with a fit.
97104
replica fit you should launch more than 100 replicas (e.g. 120) since not all of them will necessarily converge.
98105
While by default the code runs each replica separately, it is possible to run many replicas in parallel, see :ref:`parallel-label`.
99106

107+
Before training begins, ``n3fit`` recomputes the md5 hash of the runcard
108+
and compares it against the value written by ``vp-setupfit`` in step 1.
109+
If the two do not match — i.e. the runcard was modified after
110+
``vp-setupfit`` was run — ``n3fit`` aborts with an error. The fix is to
111+
re-run ``vp-setupfit`` so that the md5 (and any heavy resources affected
112+
by the change, such as the covariance matrix) are regenerated from the
113+
current runcard.
114+
115+
.. warning::
116+
117+
The md5 check can be bypassed by passing ``--skip-md5-check`` to
118+
``n3fit``::
119+
120+
n3fit <runcard>.yml <replica> --skip-md5-check
121+
122+
Use this flag at your own risk: it disables the only safeguard
123+
against running ``n3fit`` on a runcard that has diverged from the
124+
one ``vp-setupfit`` was given. Resources cached during ``vp-setupfit``
125+
(covariance matrices, t0 PDF, theory tables) may no longer be
126+
consistent with the current runcard.
127+
100128
::
101129

102130
for i in {1..120} ; do

extra_tests/regression_checks.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88

99
import pytest
1010

11+
from n3fit.tests.helpers import run_n3fit
1112
from n3fit.tests.test_fit import EXE, check_fit_results
1213
from validphys.utils import yaml_safe
1314

@@ -44,7 +45,7 @@ def test_regression_fit(tmp_path, runcard, replica, regenerate):
4445
if (wname := runcard_info.get("load")) is not None:
4546
shutil.copy(REGRESSION_FOLDER / wname, tmp_path)
4647

47-
sp.run(f"{EXE} {runcard_name} {replica}".split(), cwd=tmp_path, check=True)
48+
run_n3fit(runcard_name, f"{replica}", cwd=tmp_path, check=True)
4849
old_json_file = REGRESSION_FOLDER / f"{runcard}_{replica}.json"
4950

5051
check_fit_results(

n3fit/src/n3fit/scripts/n3fit_exec.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
import pandas as pd
1414
from ruamel.yaml import error
1515

16+
from n3fit.scripts.vp_setupfit import MD5_FILENAME, SetupFitError, _compute_fit_md5
1617
from reportengine import colors
1718
from reportengine.namespaces import NSList
1819
from validphys.api import API
@@ -71,6 +72,11 @@ def init_output(self):
7172

7273
# check if results folder exists
7374
self.output_path = pathlib.Path(self.output_path).absolute()
75+
76+
# Verify vp-setupfit hash is consistent with the current fit configuration
77+
# before we touch anything in the output folder.
78+
self._verify_setupfit_md5()
79+
7480
if not (self.output_path / "nnfit").is_dir():
7581
if not re.fullmatch(r"[\w.\-]+", self.output_path.name):
7682
raise N3FitError("Invalid output folder name. Must be alphanumeric.")
@@ -103,6 +109,29 @@ def init_output(self):
103109
self.input_folder = self.replica_path / INPUT_FOLDER
104110
self.input_folder.mkdir(exist_ok=True)
105111

112+
def _verify_setupfit_md5(self):
113+
if self.skip_md5_check:
114+
log.warning("Skipping md5 check against vp-setupfit (--skip-md5-check is set).")
115+
return
116+
117+
md5_path = self.output_path / MD5_FILENAME
118+
if not md5_path.exists():
119+
raise N3FitError(
120+
f"{MD5_FILENAME} file not found at {md5_path}. "
121+
"Run vp-setupfit on this runcard before n3fit."
122+
)
123+
stored = md5_path.read_text().strip()
124+
try:
125+
current = _compute_fit_md5(self.config_yml, self.output_path)
126+
except SetupFitError as e:
127+
raise N3FitError(f"Failed to compute current fit md5: {e}") from e
128+
if stored != current:
129+
raise N3FitError(
130+
f"md5 mismatch in {self.output_path}: stored {stored} != current {current}. "
131+
"The runcard changed since vp-setupfit was run. Re-run vp-setupfit (or pass "
132+
"--skip-md5-check to override this check if this is the desired behaviour)."
133+
)
134+
106135
@classmethod
107136
def ns_dump_description(cls):
108137
return {
@@ -312,6 +341,12 @@ def check_positive(value):
312341
help="End of the range of replicas to compute",
313342
type=check_positive,
314343
)
344+
parser.add_argument(
345+
"--skip-md5-check",
346+
help="Skip the integrity check against the md5 written by vp-setupfit.",
347+
action="store_true",
348+
default=False,
349+
)
315350
return parser
316351

317352
def get_commandline_arguments(self, cmdline=None):
@@ -341,6 +376,7 @@ def run(self):
341376
self.environment.db_host = self.args["db_host"]
342377
self.environment.db_port = self.args["db_port"]
343378
self.environment.db_name = self.args["db_name"]
379+
self.environment.skip_md5_check = self.args["skip_md5_check"]
344380
super().run()
345381
except N3FitError as e:
346382
log.error(f"Error in n3fit:\n{e}")

n3fit/src/n3fit/scripts/vp_setupfit.py

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,17 @@
6868
INPUT_FOLDER = "input"
6969

7070

71+
def _compute_fit_md5(config_yaml: pathlib.Path, output_path: pathlib.Path) -> str:
72+
"""Compute the md5 hash of the fit when vp-setupfit is run. This utility function
73+
is also used in n3fit to check that the fit configuration has not changed
74+
since the setup-fit was run."""
75+
hash_md5 = hashlib.md5()
76+
with open(config_yaml, 'rb') as f:
77+
hash_md5.update(f.read())
78+
digest = hash_md5.hexdigest()
79+
return digest
80+
81+
7182
class SetupFitError(Exception):
7283
"""Exception raised when setup-fit cannot succeed and knows why"""
7384

@@ -113,13 +124,12 @@ def init_output(self):
113124
self.input_folder.mkdir(exist_ok=True)
114125

115126
def save_md5(self):
116-
"""Generate md5 key from file"""
127+
"""Generate md5 key from the runcard and the stored fitting covmat table."""
117128
output_filename = self.output_path / MD5_FILENAME
118-
with open(self.config_yml, 'rb') as f:
119-
hash_md5 = hashlib.md5(f.read()).hexdigest()
129+
digest = _compute_fit_md5(self.config_yml, self.output_path)
120130
with open(output_filename, 'w') as g:
121-
g.write(hash_md5)
122-
log.info(f"md5 {hash_md5} stored in {output_filename}")
131+
g.write(digest)
132+
log.info(f"md5 {digest} stored in {output_filename}")
123133

124134
@classmethod
125135
def ns_dump_description(cls):

n3fit/src/n3fit/tests/helpers.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
"""
2+
Test helpers for the n3fit regression suite.
3+
4+
n3fit verifies an md5 written by vp-setupfit (see
5+
``N3FitEnvironment._verify_setupfit_md5``), so any test that invokes n3fit
6+
on a runcard must run vp-setupfit first. ``run_n3fit`` wraps that pair so
7+
individual tests don't have to.
8+
"""
9+
10+
import subprocess as sp
11+
12+
13+
def run_setupfit(runcard, *, cwd, output=None):
14+
"""Run ``vp-setupfit <runcard> [-o <output>]`` before ``n3fit``.
15+
16+
Exposed separately so tests that launch n3fit via ``sp.Popen`` (e.g. the
17+
parallel-hyperopt tests) can run the setupfit step once before forking
18+
workers.
19+
"""
20+
cmd = ["vp-setupfit", str(runcard)]
21+
if output is not None:
22+
cmd += ["-o", str(output)]
23+
return sp.run(cmd, cwd=cwd, check=True)
24+
25+
26+
def run_n3fit(runcard, args="", *, cwd, setupfit=True, **run_kwargs):
27+
"""Run ``vp-setupfit`` and then ``n3fit`` on the same runcard.
28+
29+
If ``-o <output>`` appears in ``args`` it's forwarded to vp-setupfit so
30+
the md5 lands in the directory n3fit will read it from.
31+
32+
Set ``setupfit=False`` when this call shares both the runcard *and* the
33+
output directory (default or ``-o``) with an earlier call that already
34+
ran vp-setupfit — the md5 written by that call is still valid.
35+
"""
36+
args_list = args.split() if isinstance(args, str) else list(args)
37+
38+
if setupfit:
39+
output = None
40+
if "-o" in args_list:
41+
output = args_list[args_list.index("-o") + 1]
42+
run_setupfit(runcard, cwd=cwd, output=output)
43+
44+
return sp.run(["n3fit", str(runcard), *args_list], cwd=cwd, **run_kwargs)

n3fit/src/n3fit/tests/test_fit.py

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424

2525
import n3fit
2626
from n3fit.io.writer import SuperEncoder
27+
from n3fit.tests.helpers import run_n3fit, run_setupfit
2728
from validphys.n3fit_data import replica_mcseed, replica_nnseed, replica_trvlseed
2829
from validphys.utils import yaml_safe
2930

@@ -160,7 +161,7 @@ def _auxiliary_performfit(tmp_path, runcard=QUICKNAME, replica=1, timing=True, r
160161
shutil.copy(quickpath, tmp_path)
161162
shutil.copy(weightpath, tmp_path / f"{weight_name}.weights.h5")
162163
# run the fit
163-
sp.run(f"{EXE} {quickcard} {replica}".split(), cwd=tmp_path, check=True)
164+
run_n3fit(quickcard, str(replica), cwd=tmp_path, check=True)
164165

165166
# And compare
166167
check_fit_results(tmp_path, runcard, replica, old_json_file, timing=timing, rel_error=rel_error)
@@ -185,7 +186,7 @@ def test_performfit_with_old_theory(tmp_path):
185186
quickcard = "quickcard_old.yml"
186187
quickpath = REGRESSION_FOLDER / quickcard
187188
shutil.copy(quickpath, tmp_path)
188-
sp.run(f"{EXE} {quickcard} 5".split(), cwd=tmp_path, check=True)
189+
run_n3fit(quickcard, "5", cwd=tmp_path, check=True)
189190

190191

191192
@pytest.mark.skip(reason="Still not implemented in parallel mode")
@@ -198,7 +199,7 @@ def test_hyperopt(tmp_path):
198199
# We just want to ensure that the hyperopt can run, but we need to kill it ourselves
199200
# 60 seconds should be enough
200201
with pytest.raises(sp.TimeoutExpired):
201-
sp.run(f"{EXE} {quickcard} {REPLICA} --hyperopt 1000".split(), cwd=tmp_path, timeout=60)
202+
run_n3fit(quickcard, f"{REPLICA} --hyperopt 1000", cwd=tmp_path, timeout=60)
202203

203204

204205
def test_novalidation(tmp_path, timing=30):
@@ -209,7 +210,7 @@ def test_novalidation(tmp_path, timing=30):
209210
quickpath = REGRESSION_FOLDER / quickcard
210211
shutil.copy(quickpath, tmp_path)
211212
with pytest.raises(sp.TimeoutExpired):
212-
sp.run(f"{EXE} {quickcard} {REPLICA}".split(), cwd=tmp_path, timeout=timing)
213+
run_n3fit(quickcard, REPLICA, cwd=tmp_path, timeout=timing)
213214

214215

215216
def test_weirdbasis(tmp_path, timing=30):
@@ -225,7 +226,7 @@ def test_weirdbasis(tmp_path, timing=30):
225226
shutil.copy(quickpath, tmp_path)
226227
# with pytest.raises(sp.TimeoutExpired):
227228
with pytest.raises(sp.CalledProcessError):
228-
sp.run(f"{EXE} {quickcard} {REPLICA}".split(), cwd=tmp_path, timeout=timing, check=True)
229+
run_n3fit(quickcard, REPLICA, cwd=tmp_path, timeout=timing, check=True)
229230

230231

231232
@pytest.mark.linux
@@ -239,7 +240,7 @@ def test_multireplica_runs(tmp_path, runcard):
239240
path = tmp_path / name
240241
path.mkdir()
241242
shutil.copy(quickpath, path)
242-
sp.run(f"{EXE} {quickcard} {replicas}".split(), cwd=path, check=True)
243+
run_n3fit(quickcard, replicas, cwd=path, check=True)
243244

244245
for name_1, option_1 in options.items():
245246
for name_2, option_2 in options.items():
@@ -299,8 +300,8 @@ def test_parallel_against_sequential(tmp_path, rep_from=6, rep_to=8):
299300

300301
# Now run both
301302
for r in range(rep_from, rep_to + 1):
302-
sp.run(f"{EXE} {card_sequenti} {r}".split(), cwd=tmp_path, check=True)
303-
sp.run(f"{EXE} {card_parallel} {rep_from} -r {rep_to}".split(), cwd=tmp_path, check=True)
303+
run_n3fit(card_sequenti, str(r), cwd=tmp_path, setupfit=(r == rep_from), check=True)
304+
run_n3fit(card_parallel, f"{rep_from} -r {rep_to}", cwd=tmp_path, check=True)
304305

305306
# Loop over all pseudodata files for both fits and load them up
306307
folder_seq = card_sequenti.with_suffix("") / "nnfit"
@@ -335,3 +336,19 @@ def compare_weights(option_1, option_2, file_1, file_2):
335336
weight_name = file_1[key].name
336337
err_msg = f"Difference between runs `n3fit {option_1}` and `n3fit {option_2}` in weights {weight_name}"
337338
assert_allclose(file_1[key][:], file_2[key][:], rtol=1e-5, atol=1e-5, err_msg=err_msg)
339+
340+
341+
def test_md5_mismatch_is_detected(tmp_path):
342+
"""vp-setupfit, then tamper with the runcard -> n3fit must refuse to start."""
343+
quickcard = f"{QUICKNAME}.yml"
344+
weight_name = "weights_pol" if "_pol" in quickcard else "weights"
345+
weightpath = REGRESSION_FOLDER / f"{weight_name}_1.weights.h5"
346+
shutil.copy(REGRESSION_FOLDER / quickcard, tmp_path)
347+
shutil.copy(weightpath, tmp_path / f"{weight_name}.weights.h5")
348+
run_setupfit(quickcard, cwd=tmp_path)
349+
350+
runcard_path = tmp_path / quickcard
351+
runcard_path.write_text(runcard_path.read_text() + "\n# tampered\n")
352+
353+
with pytest.raises(sp.CalledProcessError):
354+
run_n3fit(quickcard, REPLICA, cwd=tmp_path, setupfit=False, check=True)

0 commit comments

Comments
 (0)