• Telorand@reddthat.com
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    2 天前

    It’s that interoperability of unique instances that makes the Fediverse resistant to scraping. The posts are all public, but crawling it all and categorizing everything is probably like untangling a cotton ball.

    • unalivejoy@lemm.ee
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 天前

      Or you can host your own instance and let the servers send you all their data (instances can still defederate)

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 天前

      Don’t really see the problem. If you pick up the content while web crawling, you will end up with a lot of duplicates, but that’s normal. If you wanted to scrape the Fediverse in particular, you’d know the structure of the data.