• QuadratureSurferEnglish
        arrow-up
        15
        arrow-down
        0
        ·
        8 months ago
        link
        fedilink

        I’ve got it running with a 3090 and 32GB of RAM.

        There are some models that let you run with hybrid system RAM and VRAM (it will just be slower than running it exclusively with VRAM).

        • Deceptichum
          arrow-up
          16
          arrow-down
          0
          ·
          8 months ago
          link
          fedilink

          Yeah but damn does it get slow.

          I always find it interesting how text is so much slower than image generation. I can do a 1024x1024 in probably 20s, but I get like 1 word a second with text.

          • ferretEnglish
            arrow-up
            5
            arrow-down
            0
            ·
            8 months ago
            link
            fedilink

            Languages are complex and, more importantly, much less forgiving to error

      • DarkThoughts
        arrow-up
        2
        arrow-down
        1
        ·
        8 months ago
        link
        fedilink

        Hopefully we see more specific hardware for this. Like extension cards with pretty much just tensor cores and their own ram.

        • topinambour_rexEnglish
          arrow-up
          1
          arrow-down
          0
          ·
          8 months ago
          link
          fedilink

          Graphic cards without video connection exists since a while.

        • Deceptichum
          arrow-up
          1
          arrow-down
          0
          ·
          8 months ago
          link
          fedilink

          I’d love to see some consumer level AI stuff, sadly it all seems to be designed for server farms and by the time it ages out into consumer prices it’s so obsolete there’s no point in getting it.

          • DarkThoughts
            arrow-up
            1
            arrow-down
            0
            ·
            8 months ago
            link
            fedilink

            It’s not quite consumer level I’d say but Coral.ai has some small Google Edge based TPUs.

          • raldone01English
            arrow-up
            1
            arrow-down
            0
            ·
            8 months ago
            link
            fedilink

            Do they want consumer ai cards to exist though?

            Think about the data!

            • Deceptichum
              arrow-up
              1
              arrow-down
              0
              ·
              8 months ago
              link
              fedilink

              Card makers? They only want money, if theres enough consumer level demand they will make them.

    • mesamuneEnglish
      arrow-up
      10
      arrow-down
      0
      ·
      8 months ago
      link
      fedilink

      Nice! Thats a cool project, ill have to give it a try. I love the idea of self hosting local LLMs. Ive been playing around with: https://lmstudio.ai/ and it directly downloads from hugging face.

      • mtwEnglish
        arrow-up
        2
        arrow-down
        0
        ·
        8 months ago
        link
        fedilink

        There’s also ollama which seems to be similar. Not sure if LMStudio is open source but ollama is.

    • DarkThoughts
      arrow-up
      1
      arrow-down
      0
      ·
      8 months ago
      link
      fedilink

      I tried llamafile for text gen too but I couldn’t get ROCm to properly work with it to run it through my GPU without having to build it myself, which I’m really not into. And CPU text gen is waaaaaay too slow for anything. Mixtral response was like ~250 seconds or so for ~1k context tokens, I think Mistral was about 52 seconds or something around that number.

      https://github.com/Mozilla-Ocho/llamafile Mixtral is definitely beefy, Mistral is quite a bit faster and there’s a few even smaller prebuilt ones. But the smaller you go the less complex the responses will be. I think llamafile is a good step in the right direction though, but it’s still not a good out of the box experience yet. At least I got farther with it than with oobabooga, which is the recommendation for SillyTavern, which would just crash whenever it generated anything without even giving me an error.

      • FlumpkinEnglish
        arrow-up
        1
        arrow-down
        1
        ·
        8 months ago
        link
        fedilink

        How fast are they with a good GPU?

        • DarkThoughts
          arrow-up
          1
          arrow-down
          1
          ·
          8 months ago
          link
          fedilink

          Have you missed the first part where I explained that I couldn’t get it to run through my GPU? I would only have a 6650 XT anyway but even that would be significantly faster than my CPU. How far I can’t say exactly without experiencing it though, but I suspect with longer chats and consequently larger context sizes it would still be too slow to be really usable. Unless you’re okay waiting for ages for a response.

          • FlumpkinEnglish
            arrow-up
            1
            arrow-down
            0
            ·
            8 months ago
            link
            fedilink

            Sorry, I’m just curious in general how fast these local LLMs are. Maybe someone else can give some rough info.

  • anticurrentEnglish
    arrow-up
    28
    arrow-down
    0
    ·
    8 months ago
    link
    fedilink

    Can we have smaller more domain specific models. that shouldn’t require more than casual hardware. like a small model for coding, one for medicine, one for history, and so on. ???

    • fruitycoderEnglish
      arrow-up
      15
      arrow-down
      1
      ·
      8 months ago
      link
      fedilink

      Check out hugging face! Honestly fine tunned models for specific domains seems very popular (if for nothing else because training smaller models is just easier!).

      • DarkThoughts
        arrow-up
        2
        arrow-down
        1
        ·
        8 months ago
        link
        fedilink

        Unfortunately the roleplaying chatbot type models are typically fairly sizeable / demanding. I’m curious how this will develop with more specific AI hardware though, like extension cards with primarily tensor cores + their own ram, so that you don’t have to use your GPU for that. If we can drag down the price for such hardware then locally run models could become much more viable and mainstream.

        • PantherinaEnglish
          arrow-up
          4
          arrow-down
          9
          ·
          8 months ago
          link
          fedilink

          Dude sorry to say but roleplay is not equally important as medicine or coding XD

          • DarkThoughts
            arrow-up
            6
            arrow-down
            4
            ·
            8 months ago
            link
            fedilink

            For me they are. I have no use for medicine or coding bots.

            • long_chicken_boatEnglish
              arrow-up
              8
              arrow-down
              1
              ·
              8 months ago
              link
              fedilink

              but you have the use for the very software you’re using daily or medicine developments.

              I play D&D from time to time, but saying that roleplaying is more important than medicine is just nuts.

              • PantherinaEnglish
                arrow-up
                4
                arrow-down
                1
                ·
                8 months ago
                link
                fedilink

                Not wanting to be mean, I just find the thought of people talking to robots a bit strange, and use them as tools only. Not sure what “roleplay” means, if it is some “fantasy DND generator” still you could say this may be better done by humans to keep that grey matter running.

              • DarkThoughts
                arrow-up
                3
                arrow-down
                1
                ·
                8 months ago
                link
                fedilink

                Not so much for the latter but I’m pretty specifically talking about my personal use case here. lol “Roleplaying” in this scenario isn’t really referring to actual tabletop type RPGs btw. It’s the LLM roleplaying specific characters or personas that you then chat with in specific (or not so specific) scenarios. Although that same tech is also experimented with to be used in video games for NPCs. But who knows. A specifically trained model could potentially make a half decent dungeon master too.

                • KilnierEnglish
                  arrow-up
                  3
                  arrow-down
                  3
                  ·
                  8 months ago
                  link
                  fedilink

                  There also a huge amount of training, medical and otherwise, that’s done through role-playing. I could definitely see medical students getting use out of learning telemedicine with LLMs that were ultimately adapted from TTRPGs character generator schemas.

    • melroy
      arrow-up
      3
      arrow-down
      0
      ·
      8 months ago
      link
      fedilink

      I cannot function with T-Mobile internet, that is for sure. I’m moving to another ISP

  • CoreidanEnglish
    arrow-up
    6
    arrow-down
    2
    ·
    8 months ago
    link
    fedilink

    That’s gonna be a no from me dawg