In the field of AI, the process of configuring and installing environments for model inference is often a headache. If you have such a problem, then llamafile will be a blessing for you. This article was created by deepin community user "传顺页" to give you a first-hand understanding of how to play around with llamafile!


What exactly is llamafile?

llamafile is an executable Large Language Model (LLM) that can be run on your own computer, and contains the weights for a given open LLM, as well as everything you need to run the model. Surprisingly, you don't need to do any installation or configuration.


 How does llamafile accomplish all this?

This is all made possible by the power provided by llama.cpp in combination with Cosmopolitan Libc:

  • Runs across CPU microarchitectures: llamafiles can run on a wide range of CPU microarchitectures, supporting newer Intel systems using modern CPU features while being compatible with older computers.
  • Runs across CPU architectures: llamafiles run on multiple CPU architectures such as AMD64 and ARM64, and are compatible with Win32 and most UNIX shells.
  • Cross-Operating System: llamafiles run on six operating systems: MacOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD.
  • Weight embedding: LLM weights can be embedded into llamafiles, allowing uncompressed weights to be mapped directly into memory, similar to a self-extracting archive.


 Which operating systems and CPUs does llamafile support?

llamafile supports Linux 2.6.18+, Darwin (MacOS) 23.1.0+, Windows 8+, FreeBSD 13+, NetBSD 9.2+, OpenBSD 7+ and other operating systems. In terms of CPU, it supports AMD64 and ARM64 microprocessors.


How well does llamafile support GPUs?

llamafile supports Apple Metal, NVIDIA, and AMD GPUs. on macOS ARM64, GPU support can be obtained by compiling a small module. owners of NVIDIA and AMD cards need to pass specific parameters to enable GPUs.


You can also create your own llamafiles!

With the tools included in this project, you can create your own llamafiles, using whatever compatible model weights you want. You can then distribute these llamafiles to others to use them easily, no matter what type of computer they are using.

The llamafile is undoubtedly a powerful tool that makes model inference much easier, without the need to install or configure a complex environment. If you are still struggling to configure the environment for model inference, why not try llamafile? Specifically, you can refer to this tutorial:  《利用llamafile构造傻瓜式,支持多平台启动的大模型》.


How to use it on deepin?

Open a terminal and run sh . /xxxx.llamafile


How to use it on Windows?

Change the extension to .exe and double click to open it.


How do I run it on a GPU instead of a CPU?

You need to install NVIDIA/AMD driver, for NVIDIA, you should also install CUDA, and then add -ngl 99999 to the runtime, this means that the network will be moved to the GPU in 99999 layers (in reality, it's just a few hundred or a few thousand), if you don't have enough memory, you can lower the value appropriately, for example, 99, 999, and so on.

To summarize, for deepin: sh . /xxxx.llamafile -ngl 99999

For Windows: . /xxx.lamafile.exe -ngl 9999


Get Address disk link:      (Extract code: msjn)

123 cloud disk link:    (Extract code: 8MCi)

(Note: the .exe suffix can be ignored because llamafile is cross-platform and can run on Linux, Windows, Mac and BSD at the same time).



How do I access other clients?

Recommend this to have your own cross-platform ChatGPT/Gemini app in one click:

ChatGPTNextWeb/ChatGPT-Next-Web: A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS)".

After installation and setup, select OpenAI as the model service provider and change the interface address to


Access to UOS-AI (not recommended for regular users to try)

  • Start llamafile first.
    Then install NGINX:  sudo apt install nginx
    Use OpenSSL self-signed SSL certificates, valid for 10 years.

openssl req -newkey rsa:2048 -x509 -nodes -keyout localhost.key -new -out localhost.crt -subj /CN=Hostname -reqexts SAN -extensions SAN -config <(cat /usr/lib/ssl/openssl.cnf \
<(printf '[SAN]\nsubjectAltName=DNS:hostname,IP:')) -sha256 -days 3650

  • In the NGINX configuration directory, create a certificate directory and move the certificate you just made over there.

sudo mkdir /etc/nginx/cert/
sudo cp localhost.* /etc/nginx/cert/

  • Edit the NGINX default site configuration file, sudo vim /etc/nginx/sites-enabled/default Configure the following:

# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# In most cases, administrators will remove this file from sites-enabled/ and
# leave it as reference inside of sites-available where it will continue to be
# updated by the nginx packaging team.
# This file will automatically load configuration files provided by other
# applications, such as Drupal or WordPress. These applications will be made
# available underneath a path with that package name, such as /drupal8.
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.

# Default server configuration
server {
listen 80 default_server;
listen [::]:80 default_server;

# SSL configuration
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
ssl_certificate cert/localhost.crt;
ssl_certificate_key cert/localhost.key;
ssl_session_timeout 5m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;

# server_name _;

location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;


  • Restart NGINX.

sudo systemctl restart nginx

Modify the local Host to map the local IP: to the just configured, the modification method: sudo vim /etc/hosts Add a line in any line.
Then follow this content: to let UOS AI enter the English environment, so that you can add ChatGPT settings.

  • Add configuration:



  • Add successfully (you may have to try a few more times if your computer is poorly configured):


  • Getting Started:



Note: The current choice of GPT3.5/GPT4 does not seem to support streaming in UOS AI, that is, there is no way to gobble up one word at a time, resulting in having to wait for a longer period of time for all the words to be predicted before returning, a poorer experience, while other APIs such as Xunfei StarFire support streaming, so it is recommended that you still use the web chat window that comes with llamafile.

Original posting address:

Content source: deepin community

Reprinted with attribution

Leave a Reply