Background
In order to run llamafile as a service we have to create a system account which has access to the folder and can run the command on restart. We call the user llamafile.
Below I show how a fine_tuned Phi4 model can be served.
sudo useradd -r -s /usr/sbin/nologin -U -m -d /data/LLM/phi4_finetuning llamafile
sudo chown -R llamafile:llamafile /usr/local/bin/llamafileLets move the llamafile binary to a better place and change the ownership
sudo mv /data/LLM/phi4_finetuning/llamafile-0.9.3 /usr/local/bin/llamafile
sudo chmod +x /usr/local/bin/llamafileWe will also create log files for the service
sudo nano /var/log/llamafile.log /var/log/llamafile.err
sudo chown llamafile:llamafile /var/log/llamafile.*Wrap our commands in a shell script
sudo nano /usr/local/bin/llamafile-wrapper.shPaste the command in the shell script
#!/bin/bash
exec /usr/local/bin/llamafile -m /data/LLM/phi4_finetuning/unsloth.Q4_K_M.gguf -ngl 9999 --gpu nvidia --server --v2 -l 0.0.0.0:8080 --temp 0Make it an executable, this ensures that the user we created previously can easily invoke this script on restart.
sudo chmod +x /usr/local/bin/llamafile-wrapper.shService file contents
Then we create a Service file with the content and link the shell script to it. As we want the service to restart automatically on every boot, we have to set Restart=always.
[Unit]
Description=Llamafile v2 Server
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/llamafile-wrapper.sh
Restart=always
RestartSec=10
User=llamafile
WorkingDirectory=/data/LLM/phi4_finetuning
Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
StandardOutput=append:/var/log/llamafile.log
StandardError=append:/var/log/llamafile.err
[Install]
WantedBy=multi-user.targetNow we move this file to /etc/systemd/system/llamafile.service
We could also create a file and paste in the above content by using
sudo nano /etc/systemd/system/llamafile.servicePaste content from clipboard, then press ctr+x. Press y and enter.
Now the service is created but we have to enable and start it. We then run the following commands.
sudo systemctl daemon-reload
sudo systemctl enable llamafile
sudo systemctl start llamafileTo check the status of the service
sudo systemctl status llamafileStop the service by using
sudo systemctl stop llamafileDisable the service
sudo systemctl disable ollamaFuture updates
Now you only need to update the command in shell script /usr/local/bin/llamafile-wrapper.sh. You make the edits in nano or any text editor.
sudo nano /usr/local/bin/llamafile-wrapper.shOnce the new command is set. We do have to reload the daemon and start llamafile service once again.
sudo systemctl daemon-reload
sudo systemctl enable llamafile
sudo systemctl start llamafileThat is it! This post showed an example for a llamfile, but this technique is useful to create any kind of service file.