In Using NAM, Part 1, we explored the different types of models and presented the main software and hardware loaders currently available. In this second part, we will present the different NAM model architectures, after a brief introduction to the vocabulary associated with the NAM system. We will then provide a practical overview of the resources required to use NAM models “live.” A downloadable model pack is also included with this article so that you can carry out your own tests with different model architectures, if you wish.

Wavenet, epochs and ESR

NAM is based on a deep learning approach applied to audio: a NAM model will produce an output signal from an input signal after being trained for it. And to do this, NAM relies mainly on the Wavenet system. This was originally developed and perfected by Deepmind (a subsidiary of Google) around 2016, and its first field of application was audio generation and more specifically applications in the field of voice synthesis such as text-to-speech (https://en.wikipedia.org/wiki/WaveNet). From a high-level point of view, Wavenet implemented with NAM makes it possible to predict an output value from an input value through the use of a model. This model is generated by learning from a source data set (“input” file) and a target data set, i.e. the result obtained by sending the source signal to the amp or pedal (which constitutes the “output” file). Learning is achieved through NAM’s “trainer” program, which builds the model through the execution of multiple iterations (a.k.a “epochs“).

The resulting model makes its predictions (i.e., the reproduction of the sound from the captured material) based on a system of layers and using parameters governing the dilations.
In the deep learning system, the process is carried out through successive iterations (epochs) during which the ESR (Error-to-Signal Ratio) is monitored. The closer the ESR is to 0, the more faithful the model is to the original. Achieving a low ESR generally requires a relatively large number of iterations: NAM model creators frequently use up to 800 or 1000 epochs. Beyond a certain number of iterations, a plateau is generally observed at which it is no longer worthwhile to continue training, as the gains become very marginal or zero compared to the time and GPU costs.

Regarding ESR, here are some benchmarks inspired by the MOD Audio page describing best practices for AIDA-X captures (https://mod.audio/modeling-amps-and-pedals-for-the-aida-x-plugin-best-practices/), which I’ve slightly modified to apply to NAM:

  • ESR <= 0.01: excellent accuracy, the model is very accurate
  • 0.01 < ESR ≤ 0.05: very good accuracy, the model is accurate
  • 0.05 < ESR ≤ 0.15: good quality/usability compromise, the model is still fairly accurate
  • 0.15 < ESR ≤ 0.35: noticeable differences, can be used but with caution
  • ESR > 0.5: very approximate modeling, probably unsatisfactory
  • ESR > 0.9: very far off, probably unusable Levels, adjustments and alignments to be checked

We’ll get back to this capture and learning phase in another article in this series.


For your information, NAM models are JSON-formatted files that contain information about the model itself and a metadata section that provides the loader with information about the model:

To learn more about the keys used in NAM files, you can explore the contents of .NAM files yourself and consult the small documentation here: https://neural-amp-modeler.readthedocs.io/en/latest/model-file.html#

NAM Architectures

Steve Atkinson has developed 4 sets of parameters, allowing the generation of 4 types of models to address the issue of CPU resources required to use NAM models. In NAM vocabulary, these types of models are referred to as “architectures“. The basic NAM model is considered the “standard” architecture: it offers a very good level of fidelity, but it can be problematic in terms of CPU consumption for users with very low-power or old machines, and especially for proprietary platforms (multi-effects or machines like Raspberry Pi) which frequently rely on processors much less powerful than those of PCs/MACs – or even those of iPhones -, most of the time for reasons of chip costs and associated integration costs. S. Atkinson has therefore introduced lighter variants of architectures: these are the LITE, FEATHER and NANO models. These variants impact the size of the model on disk and in memory, as well as the level of CPU consumption required to run them.

The complexity (and fidelity) of NAM models is defined through the different layers and levels of dilations introduced in the previous section. Ultimately, this complexity can be summarized through the number of parameters managed by the NAM model:

ArchitectureParametersSize on disk (approx).
STANDARD13800280-300 KB
LITE6600141 KB
FEATHER300065 KB
NANO84120 KB

The NAM user community has also brought out new model architectures, primarily the xStandard and Complex models. The latter seeks to increase the level of fidelity and, unlike the lightweight approach presented in the previous paragraph, it requires significantly more CPU power to run:

ArchitectureParametersSize on disk (approx).
COMPLEX900001.9 - 2.3 MB
XSTANDARD12400270 KB

CPU power differences are presented in a later section of this article, and remember, complex models require sufficiently fast machines.

If you’re new to NAM and/or want to explore the rendering and behavior of different model architectures on your own hardware, I invite you to download a starter pack from this link: https://overdriven.fr/overdriven/index.php/nam-models/#MarkT-15_pack_1_8211_extended

This pack contains 7 basic models (1 Clean, 1 Crunch, 3 high-gain, and 2 boosted models with an OD) inspired by an MT 15* amp. This pack can be downloaded from tone3000 (you’ll find the link on the same page), but this version includes—in addition to the Standard, xStandard, and Complex models found on Tone3000—the LITE, FEAHER, and NANO versions, all built from the same re-amped files. As a bonus, this pack includes Genome presets for quick testing. The presets point to the _S (standard) models by default, but you can switch them as you wish.

*See legal notice at the bottom of the article.

Below is a table of the ESRs obtained for the different models in this pack:

Model nameESRLoudnessTone type
ODN-MarkT-15-CLEAN-02-CH1-VOL4_C.nam0.00114-17.7clean
ODN-MarkT-15-CLEAN-02-CH1-VOL4_F.nam0.01298-17.8clean
ODN-MarkT-15-CLEAN-02-CH1-VOL4_L.nam0.01058-17.8clean
ODN-MarkT-15-CLEAN-02-CH1-VOL4_N.nam0.01847-17.6clean
ODN-MarkT-15-CLEAN-02-CH1-VOL4_S.nam0.00416-17.8clean
ODN-MarkT-15-CLEAN-02-CH1-VOL4_XS.nam0.00477-17.6clean
ODN-MarkT-15-CRUNCH-01-CH2_C.nam0.00149-17.7crunch
ODN-MarkT-15-CRUNCH-01-CH2_F.nam0.00998-18.0crunch
ODN-MarkT-15-CRUNCH-01-CH2_L.nam0.00705-17.8crunch
ODN-MarkT-15-CRUNCH-01-CH2_N.nam0.02170-18.0crunch
ODN-MarkT-15-CRUNCH-01-CH2_S.nam0.00455-17.8crunch
ODN-MarkT-15-CRUNCH-01-CH2_XS.nam0.00434-17.9crunch
ODN-MarkT-15-HIGHGAIN-01-CH2-G3_C.nam0.00197-18.2hi_gain
ODN-MarkT-15-HIGHGAIN-01-CH2-G3_F.nam0.01500-18.5hi_gain
ODN-MarkT-15-HIGHGAIN-01-CH2-G3_L.nam0.00969-18.4hi_gain
ODN-MarkT-15-HIGHGAIN-01-CH2-G3_N.nam0.03836-18.6hi_gain
ODN-MarkT-15-HIGHGAIN-01-CH2-G3_S.nam0.00656-18.4hi_gain
ODN-MarkT-15-HIGHGAIN-01-CH2-G3_XS.nam0.00535-18.4hi_gain
ODN-MarkT-15-HIGHGAIN-02-CH2-G4_C.nam0.00235-18.3hi_gain
ODN-MarkT-15-HIGHGAIN-02-CH2-G4_F.nam0.01980-18.7hi_gain
ODN-MarkT-15-HIGHGAIN-02-CH2-G4_L.nam0.01292-18.6hi_gain
ODN-MarkT-15-HIGHGAIN-02-CH2-G4_N.nam0.04332-18.9hi_gain
ODN-MarkT-15-HIGHGAIN-02-CH2-G4_S.nam0.00822-18.6hi_gain
ODN-MarkT-15-HIGHGAIN-02-CH2-G4_XS.nam0.00732-18.3hi_gain
ODN-MarkT-15-HIGHGAIN-03-CH2-G5_C.nam0.00430-18.5hi_gain
ODN-MarkT-15-HIGHGAIN-03-CH2-G5_F.nam0.02364-18.7hi_gain
ODN-MarkT-15-HIGHGAIN-03-CH2-G5_L.nam0.01586-18.6hi_gain
ODN-MarkT-15-HIGHGAIN-03-CH2-G5_N.nam0.05254-19.0hi_gain
ODN-MarkT-15-HIGHGAIN-03-CH2-G5_S.nam0.00985-18.6hi_gain
ODN-MarkT-15-HIGHGAIN-03-CH2-G5_XS.nam0.00925-18.6hi_gain
ODN-MarkT-15-OD-01-CH1-VOL4_C.nam0.00078-17.6overdrive
ODN-MarkT-15-OD-01-CH1-VOL4_F.nam0.00521-17.6overdrive
ODN-MarkT-15-OD-01-CH1-VOL4_L.nam0.00400-17.7overdrive
ODN-MarkT-15-OD-01-CH1-VOL4_N.nam0.01296-17.3overdrive
ODN-MarkT-15-OD-01-CH1-VOL4_S.nam0.00209-17.6overdrive
ODN-MarkT-15-OD-01-CH1-VOL4_XS.nam0.00263-17.7overdrive
ODN-MarkT-15-OD-02-CH1-VOL4_C.nam0.00087-17.5overdrive
ODN-MarkT-15-OD-02-CH1-VOL4_F.nam0.00720-17.4overdrive
ODN-MarkT-15-OD-02-CH1-VOL4_L.nam0.00381-17.6overdrive
ODN-MarkT-15-OD-02-CH1-VOL4_N.nam0.01860-17.2overdrive
ODN-MarkT-15-OD-02-CH1-VOL4_S.nam0.00262-17.6overdrive
ODN-MarkT-15-OD-02-CH1-VOL4_XS.nam0.00328-17.6overdrive

The suffixes used in the table correspond to:

  • _S: Standard
  • _XS: XStandard
  • _C: Complex
  • _L: Lite
  • _F: Feather
  • _N: Nano


Two observations can be made when reading this table:

  • The most complex architectures allow for the lowest ESRs (the most accurate models).
  • High-gain models (and worse, fuzz models, which are not shown in this example) have higher ESRs than clean or crunch models: these types of sounds are indeed more complex to model.

If you test the sample pack, you should notice some fairly obvious differences, for example, starting with the standard and moving down to Lite, Feather, and then Nano. However, the Lite and Feather models can remain quite good in terms of rendering and can already produce good results.

Last point: in the case where we would like to use several NAM models simultaneously (scenario of one model for an overdrive and another model for an amp), we could consider using light or feather models for the pedal part, the sounds being less complex to model than those of an amp…: the lite or feather models could therefore be enough to produce good results…

CPU resources

The information presented in this section is neither exhaustive nor guaranteed, but rather is intended to provide observations and benchmarks regarding the use of different model architectures—in a specific configuration—on different machine categories, particularly from the perspective of the ability to use models “live,” i.e., to be able to practice, rehearse, and potentially track recordings in real time, either through standalone NAM applications or via a plugin in a DAW. It is strongly recommended that you test the viability of a configuration yourself before purchasing hardware, for your specific context and needs.

The table below presents CPU consumption observations for the standalone Genome application specifically configured with a simple chain using a single NAM model (see the test configuration below); these observations are made manually by reading the information presented by Task Manager on Windows and the Activity Monitor on OSX for tests on Apple hardware. The measurements are observations of “stabilized” values, observed over a few tens of seconds after loading and using a given NAM model, but note that the CPU consumption may fluctuate around the values presented.

The consumption percentages presented are also different between OSX and Windows: on OSX, the maximum CPU consumption is the number of cores * 100: for example, on an 8-core machine the maximum is 800%, and -for example- 110% represents the use of a little more than one core but only represents 110/800 = 13.75% of the total of the machine. Conversely, the values given for Windows are in percentage of the total CPU capacity of the machine: thus 20% on a 4-core machine is indeed 20% of the total CPU capacity, and note that it is also close to the maximum consumption of a single core (100/4 = 25).

Also be aware of the following aspect: less powerful machines can more easily and quickly be disrupted as soon as your system performs other tasks, for example running update systems, antivirus/antimalware scans, indexing processes, etc., which can have the effect of disrupting audio applications such as NAM or Genome standalone (interruptions, crackles, anomalies, etc.). Similarly, if the machine is at the limit of its capacity in the test (i.e. when we are getting close to using a full core of the machine), adding other effects can quickly become problematic: in short, it will be better to have some margin…

Also note the following points, which are important for understanding the presented measurements:

  • Genome version used on Mac and Windows: 1.10
  • A single NAM model and an IR loader block loaded with a mix of two IRs (100 ms)
  • Mono configuration (stereo mode consumes slightly more power)
  • Sound card configuration at 48 kHz / 128 samples: in the tests I was able to perform, using higher frequencies results in significantly higher CPU consumption (downsampling/upsampling?)
  • Animations disabled in Genome
  • Use of two sound cards with ASIO drivers under Windows: SSL2+ MKII and Scarlett 2i2, Core Audio under OSX.
  • Genome oversampling disabled (OFF)
  • And for reference, the table shows the CPU’s CPU Mark Single Thread score, as reported by https://www.cpubenchmark.net
  • 4C, 6C, 8C… : number of cores
OS/CPUCPUComplexxStandardStandardLiteFeatherNano
OSX/M4 MAX 16C4562 OK (33%)OK (21%)OK (20%)OK (19%)OK (17%)OK (17%)
OSX/M1 PRO 8C3761 OK (46%)OK (30%)OK (29%)OK (27%)OK (25%)OK (23%)
Win11/Ryzen 6C2871 OK (11-14%)OK (3-7%)OK (4-5%)
OK (3-4%)
OK (2-3%)OK (2%)
Win11/I5-9600K 6C2727 OK (15%)OK (5%)OK (5%)OK (4-5%)OK (4-5%)OK (3-4%)
OSX/i5-8259U 2190 KO**OK (53%)OK (49%)OK (44%)OK (41%)OK (37%)
Win11/N95 4C1927 KO*OK (15%)OK (13%)OK (12%)OK (10%)OK (9%)

“*”: at the limit, unstable, very sensitive to fluctuations: unusable
“**”: 100% for the host, 85% reported by Genome, at the limit and unstable

Additional CPU information:

  • Ryzen 6C: Ryzen 5 5625U 6C
  • i5-9600K: i5-9600K @ 3.70 GHz 6C
  • i5-8259U: i5-8259U @ 2.30 GHz 4C
  • N95: Alder Lake N95 @ 1.70 GHz 4C

Caution/disclaimer: When working with different sampling frequencies (88K, 176K, etc.), with larger buffer sizes, and when working within a DAW (recording, mixing), the required power requirements can vary significantly from those shown in the table. For example, at 176K/512 samples for the oldest Mac on the list (8259U), using the standard model becomes problematic, as does the xStandard on the N95 machine.

On recent and powerful machines, there are no problems running the most complex models (33% for the complex model on the M4 Max, which corresponds to a 2% load for the machine, etc.), and we can see that we can run the standard models on a wide range of machines, at least under the conditions presented in the test.

So, no xStandard or Complex models if we don’t have a powerful enough machine ? Well, there is always the possibility – in the context of recording and creating music tracks – to use your NAM plugin offline in your DAW and to bounce / freeze your tracks, a classic practice when handling somewhat complex projects (in number of tracks and/or number of plugins) or when using plugins that require a lot of resources. The workflow is a little less fluid but it is entirely possible and this puts the use of the most faithful models within your reach even if you do not own a powerful monster….

Conclusion

After an introduction to the fundamental concepts of NAM models and their creation, we presented the different available architectures and their performance in terms of fidelity (ESR). The second part presented the results of observations to provide benchmarks on the hardware and power requirements required to run the different types of models in a live and minimalistic environment. The next article will focus on a key aspect of leveraging NAM models: gain management.

Legal notice

Any and all third party companies and products listed or otherwise mentioned on this site may be trademarks of their respective owners and they are in no way affiliated or associated with Overdriven.fr or the owner of Overdriven.fr. Product names are referenced solely for the purpose of identifying the hardware used in the recording chain for impulse response capture or for guitar or amplifier and pedals sound demonstrations. Use of these names does not imply any cooperation or endorsement. Any use of amplifier, cab or pedal brand name is strictly for comparison and descriptive purpose.