Commit graph

14 commits

Author SHA1 Message Date
Daenney 9b50151f17
[feature] Beef up our AI opt-outs (#3165)
* [chore] Synchronise our robots.txt with upstream

* [feature] Add headers to escape AI crawlers

This adds 2 headers that a number of AI crawlers respect to signal that
content should not be included in their datasets.
2024-08-02 18:22:39 +02:00
Daenney ad93e57d08
[choore] Update robots.txt (#3092)
Recategorises a pair of scrapers according to their use.
2024-07-10 15:10:34 +02:00
Daenney 4604224c4d
[chore] Update our robots.txt (#3033)
This syncs our copy with the current state of the ai.robots.txt
repository. Upstream has tightened their scope to be AI-only, whereas
before it included a bunch of SEO and "web intelligence" marketing
stuff. I've kept those but moved them into their own section.
2024-06-23 15:34:21 +02:00
Daenney dcab555a6b
[chore] Update robots.txt (#2856)
This updates the robots.txt based on the list of the ai.robots.txt
repository. We can look at automating that at some point.

It's worth pointing out that some robots, namely the ones by Bytedance,
are known to ignore robots.txt entirely.
2024-04-22 11:01:37 +02:00
tobi 9fb8a78f91
[feature] New user sign-up via web page (#2796)
* [feature] User sign-up form and admin notifs

* add chosen + filtered languages to migration

* remove stray comment

* chosen languages schmosen schmanguages

* proper error on local account missing
2024-04-11 11:45:53 +02:00
Daenney 6528592dd2
[feature] Block Amazonbot (#2692)
Blocks the Amazon crawler bot.

Closes: #2686
2024-02-27 13:25:08 +00:00
tobi 0ff52b71f2
[chore] Refactor HTML templates and CSS (#2480)
* [chore] Refactor HTML templates and CSS

* eslint

* ignore "Local"

* rss tests

* fiddle with OG just a tiny bit

* dick around with polls a bit more so SR stops saying "clickable"

* remove break

* oh lord

* don't lazy load avatar

* fix ogmeta tests

* clean up some cruft

* catch remaining calls to c.HTML

* fix error rendering + stack overflow in tag

* allow templating attributes

* fix indent

* set aria-hidden on status complementary content, since it's already present in the label anyway

* tidy up templating calls a little

* try to make styling a bit more consistent + readable

* fix up some remaining CSS issues

* fix up reports
2023-12-27 11:23:52 +01:00
Daenney 0cce2c0838
[feature] Block a bunch of "AI" crawlers (#2239)
* [feature] Block Google Bard/AI crawlers

* [feature] Block the other OpenAI crawler

* [feature] Block Common Crawl crawler

This is used in research, but also gleefully advertises itself as the
training source used in all LLMs and GPT-3.

Fixes: #2240

* [feature] Block Omgilikebot

Used by some shady big web data engine company.

* [feature] Block Meta's language model crawler

* [feature] Block well-known.dev crawler
2023-09-30 20:44:57 +01:00
tobi 4b05dcde43
[chore] Update robots.txt, give chatgpt the middle finger (#2085) 2023-08-08 13:16:34 +02:00
Daenney 5e2bf0bdca
[chore] Improve copyright header handling (#1608)
* [chore] Remove years from all license headers

Years or year ranges aren't required in license headers. Many projects
have removed them in recent years and it avoids a bit of yearly toil.

In many cases our copyright claim was also a bit dodgy since we added
the 2021-2023 header to files created after 2021 but you can't claim
copyright into the past that way.

* [chore] Add license header check

This ensures a license header is always added to any new file. This
avoids maintainers/reviewers needing to remember to check for and ask
for it in case a contribution doesn't include it.

* [chore] Add missing license headers

* [chore] Further updates to license header

* Use the more common // indentend comment format
* Remove the hack we had for the linter now that we use the // format
* Add SPDX license identifier
2023-03-12 16:00:57 +01:00
f0x52 17eecfb6d9
[feature] Public list of suspended domains (#1362)
* basic rendered domain blocklist (unauthenticated!)

* style basic domain block list

* better formatting for domain blocklist

* add opt-in config option for showing suspended domains

* format/linter

* re-use InstancePeersGet for web-accessible domain blocklist

* reword explanation, border styling

* always attach blocklist handler, update error message

* domain blocklist error message grammar
2023-01-25 18:06:41 +01:00
tobi 0dbe6c514f
[chore] Update/add license headers for 2023 (#1304) 2023-01-05 12:43:00 +01:00
tobi 941893a774
[chore] The Big Middleware and API Refactor (tm) (#1250)
* interim commit: start refactoring middlewares into package under router

* another interim commit, this is becoming a big job

* another fucking massive interim commit

* refactor bookmarks to new style

* ambassador, wiz zeze commits you are spoiling uz

* she compiles, we're getting there

* we're just normal men; we're just innocent men

* apiutil

* whoopsie

* i'm glad noone reads commit msgs haha :blob_sweat:

* use that weirdo go-bytesize library for maxMultipartMemory

* fix media module paths
2023-01-02 12:10:50 +00:00
tobi dd83ad053c
[feature] Add meta robots tag; allow robots to index profile card if user is Discoverable (#842)
* rework robots.txt response

* don't let robots snippet from statuses/threads

* allow robots to index if user is Discoverable

* add license text
2022-09-29 12:03:17 +02:00