{"id":33379,"date":"2024-03-18T14:52:55","date_gmt":"2024-03-18T06:52:55","guid":{"rendered":"https:\/\/www.deepin.org\/?p=33379"},"modified":"2024-03-18T15:04:06","modified_gmt":"2024-03-18T07:04:06","slug":"llamafile-a-must-have-tool","status":"publish","type":"post","link":"https:\/\/www.deepin.org\/en\/llamafile-a-must-have-tool\/","title":{"rendered":"llamafile: A Must-Have Tool in the Age of AI"},"content":{"rendered":"<img loading=\"lazy\" src=\"https:\/\/storage.deepin.org\/thread\/202403180628239023_en.png\" alt=\"\" width=\"900\" height=\"383\" \/><\/p>\n<p>In the field of AI, the process of configuring and installing environments for model inference is often a headache. If you have such a problem, then llamafile will be a blessing for you. This article was created by deepin community user \"\u4f20\u987a\u9875\" to give you a first-hand understanding of how to play around with llamafile!<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>What exactly is llamafile?<\/strong><\/h1>\n<p>llamafile is an executable Large Language Model (LLM) that can be run on your own computer, and contains the weights for a given open LLM, as well as everything you need to run the model. Surprisingly, you don't need to do any installation or configuration.<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>\u00a0How does llamafile accomplish all this?<\/strong><\/h1>\n<p>This is all made possible by the power provided by llama.cpp in combination with Cosmopolitan Libc:<\/p>\n<ul>\n<li><strong>Runs across CPU microarchitectures:<\/strong> llamafiles can run on a wide range of CPU microarchitectures, supporting newer Intel systems using modern CPU features while being compatible with older computers.<\/li>\n<li><strong>Runs across CPU architectures:<\/strong> llamafiles run on multiple CPU architectures such as AMD64 and ARM64, and are compatible with Win32 and most UNIX shells.<\/li>\n<li><strong>Cross-Operating System:<\/strong> llamafiles run on six operating systems: MacOS, Windows, Linux, FreeBSD, OpenBSD, and NetBSD.<\/li>\n<li><strong>Weight embedding:<\/strong> LLM weights can be embedded into llamafiles, allowing uncompressed weights to be mapped directly into memory, similar to a self-extracting archive.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>\u00a0Which operating systems and CPUs does llamafile support?<\/strong><\/h1>\n<p>llamafile supports Linux 2.6.18+, Darwin (MacOS) 23.1.0+, Windows 8+, FreeBSD 13+, NetBSD 9.2+, OpenBSD 7+ and other operating systems. In terms of CPU, it supports AMD64 and ARM64 microprocessors.<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>How well does llamafile support GPUs?<\/strong><\/h1>\n<p>llamafile supports Apple Metal, NVIDIA, and AMD GPUs. on macOS ARM64, GPU support can be obtained by compiling a small module. owners of NVIDIA and AMD cards need to pass specific parameters to enable GPUs.<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>You can also create your own llamafiles!<\/strong><\/h1>\n<p>With the tools included in this project, you can create your own llamafiles, using whatever compatible model weights you want. You can then distribute these llamafiles to others to use them easily, no matter what type of computer they are using.<\/p>\n<p>The llamafile is undoubtedly a powerful tool that makes model inference much easier, without the need to install or configure a complex environment. If you are still struggling to configure the environment for model inference, why not try llamafile? Specifically, you can refer to this tutorial: <a href=\"https:\/\/zhuanlan.zhihu.com\/p\/686886176\">https:\/\/zhuanlan.zhihu.com\/p\/686886176<\/a>\u00a0 \u300a\u5229\u7528llamafile\u6784\u9020\u50bb\u74dc\u5f0f\uff0c\u652f\u6301\u591a\u5e73\u53f0\u542f\u52a8\u7684\u5927\u6a21\u578b\u300b.<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>How to use it on deepin?<\/strong><\/h1>\n<p>Open a terminal and run <span style=\"color: #0000ff;\"><em><strong>sh . \/xxxx.llamafile<\/strong><\/em><\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>How to use it on Windows?<\/strong><\/h1>\n<p>Change the extension to .exe and double click to open it.<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>How do I run it on a GPU instead of a CPU?<\/strong><\/h1>\n<p>You need to install NVIDIA\/AMD driver, for NVIDIA, you should also install CUDA, and then add -ngl 99999 to the runtime, this means that the network will be moved to the GPU in 99999 layers (in reality, it's just a few hundred or a few thousand), if you don't have enough memory, you can lower the value appropriately, for example, 99, 999, and so on.<\/p>\n<p>To summarize, for deepin: <em><span style=\"color: #0000ff;\"><strong>sh . \/xxxx.llamafile -ngl 99999<\/strong><\/span><\/em><\/p>\n<p>For Windows: <em><span style=\"color: #0000ff;\"><strong>. \/xxx.lamafile.exe -ngl 9999<\/strong><\/span><\/em><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>Get Address<\/strong><\/h1>\n<p>Baidu.com disk link:\u00a0\u00a0https:\/\/pan.baidu.com\/s\/14cv7McPa1XpXBNKy914HyQ?pwd=msjn\u00a0 \u00a0<strong> \u00a0 (Extract code: msjn)<\/strong><\/p>\n<p>123 cloud disk link:\u00a0 https:\/\/www.123pan.com\/s\/oEqDVv-IP4o.html\u00a0 \u00a0<strong> (Extract code: 8MCi)<\/strong><\/p>\n<p>(Note: the .exe suffix can be ignored because llamafile is cross-platform and can run on Linux, Windows, Mac and BSD at the same time).<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>Screenshots<\/strong><\/h1>\n<p><img class=\"aligncenter\" src=\"https:\/\/storage.deepin.org\/thread\/202403140917442540_image.png\" \/><\/p>\n<h1 style=\"text-align: center;\"><strong>How do I access other clients?<\/strong><\/h1>\n<p>Recommend this to have your own cross-platform ChatGPT\/Gemini app in one click:<\/p>\n<p><a href=\"https:\/\/github.com\/ChatGPTNextWeb\/ChatGPT-Next-Web\">https:\/\/github.com\/ChatGPTNextWeb\/ChatGPT-Next-Web<\/a><\/p>\n<p>ChatGPTNextWeb\/ChatGPT-Next-Web: A cross-platform ChatGPT\/Gemini UI (Web \/ PWA \/ Linux \/ Win \/ MacOS)\".<\/p>\n<p>After installation and setup, select OpenAI as the model service provider and change the interface address to <span style=\"color: #0000ff;\"><em><strong>http:\/\/127.0.0.1:8080.<\/strong><\/em><\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"text-align: center;\"><strong>Access to UOS-AI (not recommended for regular users to try)<\/strong><\/h1>\n<ul>\n<li>Start llamafile first.<br \/>\nThen install NGINX:\u00a0 <em><span style=\"color: #0000ff;\"><strong>sudo apt install nginx<\/strong><\/span><\/em><br \/>\nUse OpenSSL self-signed SSL certificates, valid for 10 years.<\/li>\n<\/ul>\n<p>openssl req -newkey rsa:2048 -x509 -nodes -keyout localhost.key -new -out localhost.crt -subj \/CN=Hostname -reqexts SAN -extensions SAN -config &lt;(cat \/usr\/lib\/ssl\/openssl.cnf \\<br \/>\n&lt;(printf '[SAN]\\nsubjectAltName=DNS:hostname,IP:127.0.0.1')) -sha256 -days 3650<\/p>\n<ul>\n<li><strong>In the NGINX configuration directory, create a certificate directory and move the certificate you just made over there.<\/strong><\/li>\n<\/ul>\n<p>sudo mkdir \/etc\/nginx\/cert\/<br \/>\nsudo cp localhost.* \/etc\/nginx\/cert\/<\/p>\n<ul>\n<li><strong>Edit the NGINX default site configuration file,<span style=\"color: #3366ff;\"> sudo vim \/etc\/nginx\/sites-enabled\/default <\/span>Configure the following:<\/strong><\/li>\n<\/ul>\n<p>##<br \/>\n# You should look at the following URL's in order to grasp a solid understanding<br \/>\n# of Nginx configuration files in order to fully unleash the power of Nginx.<br \/>\n# https:\/\/www.nginx.com\/resources\/wiki\/start\/<br \/>\n# https:\/\/www.nginx.com\/resources\/wiki\/start\/topics\/tutorials\/config_pitfalls\/<br \/>\n# https:\/\/wiki.debian.org\/Nginx\/DirectoryStructure<br \/>\n#<br \/>\n# In most cases, administrators will remove this file from sites-enabled\/ and<br \/>\n# leave it as reference inside of sites-available where it will continue to be<br \/>\n# updated by the nginx packaging team.<br \/>\n#<br \/>\n# This file will automatically load configuration files provided by other<br \/>\n# applications, such as Drupal or WordPress. These applications will be made<br \/>\n# available underneath a path with that package name, such as \/drupal8.<br \/>\n#<br \/>\n# Please see \/usr\/share\/doc\/nginx-doc\/examples\/ for more detailed examples.<br \/>\n##<\/p>\n<p># Default server configuration<br \/>\n#<br \/>\nserver {<br \/>\nlisten 80 default_server;<br \/>\nlisten [::]:80 default_server;<\/p>\n<p># SSL configuration<br \/>\nlisten 443 ssl default_server;<br \/>\nlisten [::]:443 ssl default_server;<br \/>\nssl_certificate cert\/localhost.crt;<br \/>\nssl_certificate_key cert\/localhost.key;<br \/>\nssl_session_timeout 5m;<br \/>\nssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;<br \/>\nssl_protocols TLSv1 TLSv1.1 TLSv1.2;<br \/>\nssl_prefer_server_ciphers on;<br \/>\n# Add index.php to the list if you are using PHP<br \/>\nindex index.html index.htm index.nginx-debian.html;<\/p>\n<p># server_name _;<br \/>\nserver_name api.openai.com;<\/p>\n<p>location \/ {<br \/>\nproxy_pass http:\/\/127.0.0.1:8080;<br \/>\nproxy_set_header Host $host;<br \/>\nproxy_set_header X-Real-IP $remote_addr;<br \/>\nproxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;<br \/>\n}<\/p>\n<p>}<\/p>\n<ul>\n<li><strong>Restart NGINX.<\/strong><\/li>\n<\/ul>\n<p>sudo systemctl restart nginx<\/p>\n<p><strong>Modify the local Host to map the local IP: 127.0.0.1 to the api.openai.com just configured, the modification method: sudo vim \/etc\/hosts Add a line 127.0.0.1 api.openai.com in any line.<\/strong><br \/>\n<strong>Then follow this content: <a href=\"https:\/\/bbs.deepin.org\/zh\/post\/267049\">https:\/\/bbs.deepin.org\/zh\/post\/267049<\/a> to let UOS AI enter the English environment, so that you can add ChatGPT settings.<\/strong><\/p>\n<ul>\n<li><strong>Add configuration:<\/strong><\/li>\n<\/ul>\n<p><img class=\"aligncenter\" src=\"https:\/\/storage.deepin.org\/thread\/202403150222291137_image.png\" alt=\"image.png\" \/><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li><strong>Add successfully (you may have to try a few more times if your computer is poorly configured):<\/strong><\/li>\n<\/ul>\n<p><img class=\"aligncenter\" src=\"https:\/\/storage.deepin.org\/thread\/202403150227193033_image.png\" alt=\"image.png\" \/><\/p>\n<ul>\n<li><strong>Getting Started:<\/strong><\/li>\n<\/ul>\n<p><img class=\"aligncenter\" src=\"https:\/\/storage.deepin.org\/thread\/202403150229278643_image.png\" alt=\"image.png\" \/><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Note:<\/strong> The current choice of GPT3.5\/GPT4 does not seem to support streaming in UOS AI, that is, there is no way to gobble up one word at a time, resulting in having to wait for a longer period of time for all the words to be predicted before returning, a poorer experience, while other APIs such as Xunfei StarFire support streaming, so it is recommended that you still use the web chat window that comes with llamafile.<\/p>\n<p>Original posting address: <a href=\"https:\/\/bbs.deepin.org\/zh\/post\/269443\">https:\/\/bbs.deepin.org\/zh\/post\/269443<\/a><\/p>\n<p style=\"text-align: right;\">Content source: deepin community<\/p>\n<p style=\"text-align: right;\">Reprinted with attribution<\/p>","protected":false},"excerpt":{"rendered":"<p>In the field of AI, the process of configuring and installing environments for model inference is often a headache. If you have such a problem, then llamafile will be a blessing for you. This article was created by deepin community user \"\u4f20\u987a\u9875\" to give you a first-hand understanding of how to play around with llamafile! &nbsp; What exactly is llamafile? llamafile is an executable Large Language Model (LLM) that can be run on your own computer, and contains the weights for a given open LLM, as well as everything you need to run the model. Surprisingly, you don't need to ...<a href=https:\/\/www.deepin.org\/en\/llamafile-a-must-have-tool\/>Read more<\/a><\/p>\n","protected":false},"author":11164,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[155,93],"tags":[],"_links":{"self":[{"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/posts\/33379"}],"collection":[{"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/users\/11164"}],"replies":[{"embeddable":true,"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/comments?post=33379"}],"version-history":[{"count":6,"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/posts\/33379\/revisions"}],"predecessor-version":[{"id":33385,"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/posts\/33379\/revisions\/33385"}],"wp:attachment":[{"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/media?parent=33379"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/categories?post=33379"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.deepin.org\/en\/wp-json\/wp\/v2\/tags?post=33379"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}