Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse

NodeBB

  1. Home
  2. Selfhosted
  3. I've just created c/Ollama!

I've just created c/Ollama!

Scheduled Pinned Locked Moved Selfhosted
selfhosted
29 Posts 12 Posters 29 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B [email protected]

    Totally depends on your hardware, and what you tend to ask it. What are you running? What do you use it for? Do you prefer speed over accuracy?

    W This user is from outside of this forum
    W This user is from outside of this forum
    [email protected]
    wrote last edited by
    #20

    I have a MacBook 2 pro (Apple silicon) and would kind of like to replace Google's Gemini as my go-to LLM. I think I'd like to run something like Mistral, probably. Currently I do have Ollama and some version of Mistral running, but I almost never used it as it's on my laptop, not my phone.

    I'm not big on LLMs and if I can find an LLM that I run locally and helps me get off of using Google Search and Gimini, that could be awesome. Currently I use a combo of Firefox, Qwant, Google Search, and Gemini for my daily needs. I'm not big into the direction Firefox is headed, I've heard there are arguments against Qwant, and using Gemini feels like the wrong answer for my beliefs and opinions.

    I'm looking for something better without too much time being sunk into something I may only sort of like. Tall order, I know, but I figured I'd give you as much info as I can.

    B 2 Replies Last reply
    0
    • W [email protected]

      I have a MacBook 2 pro (Apple silicon) and would kind of like to replace Google's Gemini as my go-to LLM. I think I'd like to run something like Mistral, probably. Currently I do have Ollama and some version of Mistral running, but I almost never used it as it's on my laptop, not my phone.

      I'm not big on LLMs and if I can find an LLM that I run locally and helps me get off of using Google Search and Gimini, that could be awesome. Currently I use a combo of Firefox, Qwant, Google Search, and Gemini for my daily needs. I'm not big into the direction Firefox is headed, I've heard there are arguments against Qwant, and using Gemini feels like the wrong answer for my beliefs and opinions.

      I'm looking for something better without too much time being sunk into something I may only sort of like. Tall order, I know, but I figured I'd give you as much info as I can.

      B This user is from outside of this forum
      B This user is from outside of this forum
      [email protected]
      wrote last edited by [email protected]
      #21

      Honestly perplexity, the online service, is pretty good.

      As for local running, one question first: how much RAM does your Mac have? This is basically the factor for what model you can and should run.

      1 Reply Last reply
      1
      • W [email protected]

        I have a MacBook 2 pro (Apple silicon) and would kind of like to replace Google's Gemini as my go-to LLM. I think I'd like to run something like Mistral, probably. Currently I do have Ollama and some version of Mistral running, but I almost never used it as it's on my laptop, not my phone.

        I'm not big on LLMs and if I can find an LLM that I run locally and helps me get off of using Google Search and Gimini, that could be awesome. Currently I use a combo of Firefox, Qwant, Google Search, and Gemini for my daily needs. I'm not big into the direction Firefox is headed, I've heard there are arguments against Qwant, and using Gemini feels like the wrong answer for my beliefs and opinions.

        I'm looking for something better without too much time being sunk into something I may only sort of like. Tall order, I know, but I figured I'd give you as much info as I can.

        B This user is from outside of this forum
        B This user is from outside of this forum
        [email protected]
        wrote last edited by [email protected]
        #22

        Actually, to go ahead and answer, the "fastest" path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).

        Hopefully one of these models, depending on how much RAM you have:

        https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125

        https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ

        https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508

        https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ

        With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/

        And then use LM Studio (or some other MLX backend, or even free online API models) as the 'engine'

        Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I'd have to look around.

        W 1 Reply Last reply
        0
        • B [email protected]

          Actually, to go ahead and answer, the "fastest" path would be LM Studio (which supports MLX quants natively and is not time intensive to install), and a DWQ quantization (which is a newer, higher quality variant of MLX models).

          Hopefully one of these models, depending on how much RAM you have:

          https://huggingface.co/mlx-community/Qwen3-14B-4bit-DWQ-053125

          https://huggingface.co/mlx-community/Magistral-Small-2506-4bit-DWQ

          https://huggingface.co/mlx-community/Qwen3-30B-A3B-4bit-DWQ-0508

          https://huggingface.co/mlx-community/GLM-4-32B-0414-4bit-DWQ

          With a bit more time invested, you could try to set up Open Web UI as an alterantive interface (which has its own built in web search like Gemini): https://openwebui.com/

          And then use LM Studio (or some other MLX backend, or even free online API models) as the 'engine'

          Alternatively, especially if you have a small RAM pool, Gemma 12B QAT Q4_0 is quite good, and you can run it with LM Studio or anything else that supports a GGUF. Not sure about 12B-ish thinking models off the top of my head, I'd have to look around.

          W This user is from outside of this forum
          W This user is from outside of this forum
          [email protected]
          wrote last edited by
          #23

          This is all new to me, so I'll have to do a bit of homework on this. Thanks for the detailed and linked reply!

          B 1 Reply Last reply
          1
          • W [email protected]

            This is all new to me, so I'll have to do a bit of homework on this. Thanks for the detailed and linked reply!

            B This user is from outside of this forum
            B This user is from outside of this forum
            [email protected]
            wrote last edited by [email protected]
            #24

            I was a bit mistaken, these are the models you should consider:

            https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ

            https://huggingface.co/AnteriorAI/gemma-3-4b-it-qat-q4_0-gguf

            https://huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)

            they are state-of-the-art at this size, as far as I know.

            W 1 Reply Last reply
            2
            • B [email protected]

              I was a bit mistaken, these are the models you should consider:

              https://huggingface.co/mlx-community/Qwen3-4B-4bit-DWQ

              https://huggingface.co/AnteriorAI/gemma-3-4b-it-qat-q4_0-gguf

              https://huggingface.co/unsloth/Jan-nano-GGUF (specifically the UD-Q4 or UD-Q5 file)

              they are state-of-the-art at this size, as far as I know.

              W This user is from outside of this forum
              W This user is from outside of this forum
              [email protected]
              wrote last edited by
              #25

              Awesome, I'll give these a spin and see how it goes. Much appreciated!

              1 Reply Last reply
              1
              • B [email protected]

                Totally depends on your hardware, and what you tend to ask it. What are you running? What do you use it for? Do you prefer speed over accuracy?

                S This user is from outside of this forum
                S This user is from outside of this forum
                [email protected]
                wrote last edited by
                #26

                My HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.

                B 1 Reply Last reply
                0
                • S [email protected]

                  My HomeAssistant is running on Unraid but I have an old NVIDIA Quadro P5000. I really want to run a vision model so that it can describe who is at my doorbell.

                  B This user is from outside of this forum
                  B This user is from outside of this forum
                  [email protected]
                  wrote last edited by [email protected]
                  #27

                  Oh actually that's a great card for LLM serving!

                  Use the llama.cpp server from source, it has better support for Pascal cards than anything else:

                  https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md


                  Gemma 3 is a hair too big (like 17-18GB), so I'd start with InternVL 14B Q5K XL: https://huggingface.co/unsloth/InternVL3-14B-Instruct-GGUF

                  Or Mixtral 24B IQ4_XS for more 'text' intelligence than vision: https://huggingface.co/unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF

                  I'm a bit 'behind' on the vision model scene, so I can look around more if they don't feel sufficient, or walk you through setting up the llama.cpp server. Basically it provides an endpoint which you can hit with the same API as ChatGPT.

                  1 Reply Last reply
                  0
                  • B [email protected]

                    TBH you should fold this into localllama? Or open source AI?

                    I have very mixed (mostly bad) feelings on ollama. In a nutshell, they're kinda Twitter attention grabbers that give zero credit/contribution to the underlying framework (llama.cpp). And that's just the tip of the iceberg, they've made lots of controversial moves, and it seems like they're headed for commercial enshittification.

                    They're... slimy.

                    They like to pretend they're the only way to run local LLMs and blot out any other discussion, which is why I feel kinda bad about a dedicated ollama community.

                    It's also a highly suboptimal way for most people to run LLMs, especially if you're willing to tweak.

                    I would always recommend Kobold.cpp, tabbyAPI, ik_llama.cpp, Aphrodite, LM Studio, the llama.cpp server, sglang, the AMD lemonade server, any number of backends over them. Literally anything but ollama.


                    ...TL;DR I don't the the idea of focusing on ollama at the expense of other backends. Running LLMs locally should be the community, not ollama specifically.

                    S This user is from outside of this forum
                    S This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #28

                    Thanks for Lemonade hint. For Ryzen AI: https://github.com/lemonade-sdk/lemonade (linux=cpu for now)

                    B 1 Reply Last reply
                    2
                    • S [email protected]

                      Thanks for Lemonade hint. For Ryzen AI: https://github.com/lemonade-sdk/lemonade (linux=cpu for now)

                      B This user is from outside of this forum
                      B This user is from outside of this forum
                      [email protected]
                      wrote last edited by
                      #29

                      You can still use the IGP, which might be faster in some cases.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      Powered by NodeBB Contributors
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups