Comments

Log in with itch.io to leave a comment.

this isnt a bad little implementation you've packaged up here.  it's probably the fastest AND least resource hungry LLM i've ever run, though it does forget rules, insert nonsensical details, and repeat itself.  it was quite easy to edit the starting prompts you provided into anything at all i wanted.

the only thing really missing is a "redo output" command.

(1 edit)

Thank you so much for the feedback!

Unfortunately, Ollama doesn't have a "redo" function, and we apologize for that.

Anyway, the proposed model (quantized to 4 bits) is also available as a non-quantized version (16GB) or quantized to 8 bits (8.5GB). In general, quantization degrades performance, so if you have enough RAM and a sufficiently powerful CPU, you could try using the less quantized versions to see if the issues you encountered are reduced while maintaining a speed that suits your needs.

To download and use the 16GB model, modify the "pull" and "run" files by setting:

```

ollama pull tarruda/neuraldaredevil-8b-abliterated:fp16

ollama run tarruda/neuraldaredevil-8b-abliterated:fp16

```

To download and use the 8.5GB model, modify the "pull" and "run" files by setting:

```

ollama pull lstep/neuraldaredevil-8b-abliterated:q8_0

ollama run lstep/neuraldaredevil-8b-abliterated:q8_0

```

You can find information on these versions here:  

https://ollama.com/tarruda/neuraldaredevil-8b-abliterated  

https://ollama.com/lstep/neuraldaredevil-8b-abliterated

(2 edits)

hey thanks, i actually was wondering if you had the larger versions available.  i'll test out the 8-bit version fairly soon.  if that runs nearly as smoothly, but with better output, i may even have to test the big boy.


edit, how can i modify the batch file to launch in the 8-bit rather than 4-bit model?
second edit, i figured it out, i didnt notice the "prefix" in the model path changed between versions at first.

Good job! :) Please note also the ":q8_0" or ":fp16" at the end of the model version name.