• Admiral PatrickEnglish
    arrow-up
    21
    arrow-down
    4
    ·
    1 month ago
    link
    fedilink

    Google (and search engines in general) is at least providing a service by indexing and making discoverable the websites they crawl. OpenAI is is just hoovering up the data and providing nothing in return. Socializing the cost, privatizing the profits.

    • masterspaceEnglish
      arrow-up
      7
      arrow-down
      23
      ·
      1 month ago
      edit-2
      1 month ago
      link
      fedilink

      Uh, that’s objectively false.

      OoenAI also provides ChatGPT as a “free” service, and Google has made billions off of that “free” service they oh so altruistically provide you.

      • teftEnglish
        arrow-up
        27
        arrow-down
        1
        ·
        1 month ago
        link
        fedilink

        Google points to your content so others can find it.

        OpenAI scrapes your content to use to make more content.

        • masterspaceEnglish
          arrow-up
          4
          arrow-down
          27
          ·
          1 month ago
          link
          fedilink

          That’s not a meaningful distinction, I spent all day using a Copilot search engine because the answers I wanted were scattered across a bunch of different documentation sites.

          It was both using the AI models to interpret my commands (not generation at all), and then only publishes content to me specifically.

          • teftEnglish
            arrow-up
            14
            arrow-down
            0
            ·
            1 month ago
            link
            fedilink

            I’m talking about the training phase of LLMs.that is the portion that is doing the scraping and generation of copy written data.

            You using an already trained LLM to do some searches is not the same thing.

          • ℍ𝕂-𝟞𝟝English
            arrow-up
            13
            arrow-down
            0
            ·
            1 month ago
            link
            fedilink

            Technically it is meaningful, fair use is for specifically things that don’t replace the original in function.

            • masterspaceEnglish
              arrow-up
              2
              arrow-down
              5
              ·
              1 month ago
              link
              fedilink

              Depends on what the function was. If the function was to drive ad revenue to your site, then sure, if the function was to get information into the public, then it’s not replacing the function so much as altering and updating it.

              • ℍ𝕂-𝟞𝟝English
                arrow-up
                4
                arrow-down
                0
                ·
                1 month ago
                link
                fedilink

                If that “altering and updating” means people don’t need to read the original anymore, then it’s not fair use.

                TBH I’m for reigning in copyright substantially, and would be on the shitty text generator company side of this, but only if it makes a precedent and erodes copyright as a whole instead of just creating a carveout if you have a lot of moeny for lawyers.

                • masterspaceEnglish
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  1 month ago
                  edit-2
                  1 month ago
                  link
                  fedilink

                  I generally agree, but I really think people in this thread are being overly dismissive about how useful LLMs are, just because they’re associated with techbros who are often associated with relatively useless stuff like crypto.

                  I mean most people still can’t run an LLM on their local machine, which vastly limits what developers can use them for. No video game or open source software can really include them in any core features because most people can’t run them. Give it 3 years when every machine has a dedicated neural chip and devs can start using local LLMs that don’t require a cloud connection and Azure credits and you’ll start seeing actually interesting and inventive uses of them.

                  There’s still problems with attributing sources of information but I honestly feel like if all LLMs that were trained on copyrighted data had to be published open source so that anyone could use them it would get us enough of the way there that their benefits would outweigh their costs.

          • BakerBagelEnglish
            arrow-up
            11
            arrow-down
            1
            ·
            1 month ago
            link
            fedilink

            It’s absolutely a meaningful distinction. Search engines push people to tour website where you can capitalize on your audience however you see fit. LLM’s take your content, through them through the mixer and sell it back to people. It’s the difference between a movie reviewer explaining a movie and a dude in an alley selling a pirated copy of the movie.

            • masterspaceEnglish
              arrow-up
              1
              arrow-down
              2
              ·
              1 month ago
              edit-2
              1 month ago
              link
              fedilink

              A) An LLM does not inherently sell you anything. Some companies charge you to run and use their LLMs (OpenAI), and some companies publish their LLMs open source for anyone to use (Meta, Microsoft). With neural chips starting to pop in PCs and phones, pretty soon anyone will be able to run an open source LLM locally on their machine, completely for free.

              B) LLMs still rarely regurgitate the exact same original source. This would be more like someone in the back alley putting on their own performance of the movie and morphing it and adjusting it in real time based on your prompts and comments, which is a lot closer to parody and fair use than blatant piracy.