Wikipedia:Bots/Requests for approval/CutlassBot: Difference between revisions
m →Discussion: add url |
re |
||
| Line 223: | Line 223: | ||
*:::: <span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 02:16, 20 June 2026 (UTC) |
*:::: <span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 02:16, 20 June 2026 (UTC) |
||
*:::::@[[User:Headbomb|Headbomb]], Regarding CS1 links, I understand that the module is hiding them like at [[Jeremy Lin#cite ref-17]] (note there is no "Archived" link in the citation to render the link to [...]today/20120630151832/http://www.denverpost.com/commented/ci_16724722). I think this was the leading cause of the reduction of 600k external links. Am I missing something? [[User:Dw31415|<span style="background-color: snow; font-family: 'Linux Libertine', 'Georgia', 'Times', 'Source Serif Pro', serif">Dw31415</span>]] ([[User talk:Dw31415#top|talk]]) 02:43, 20 June 2026 (UTC) |
*:::::@[[User:Headbomb|Headbomb]], Regarding CS1 links, I understand that the module is hiding them like at [[Jeremy Lin#cite ref-17]] (note there is no "Archived" link in the citation to render the link to [...]today/20120630151832/http://www.denverpost.com/commented/ci_16724722). I think this was the leading cause of the reduction of 600k external links. Am I missing something? [[User:Dw31415|<span style="background-color: snow; font-family: 'Linux Libertine', 'Georgia', 'Times', 'Source Serif Pro', serif">Dw31415</span>]] ([[User talk:Dw31415#top|talk]]) 02:43, 20 June 2026 (UTC) |
||
{{outdent|5}} |
|||
My bad, I think I was looking at a hardcoded instance of an example somewhere, rather than a live version and that gave me an innacurate picture of the current situation. Since CS1 and webarchive templates both hide links, it seems reasonable to approve a task that would also hide the remaining bare links. I'd have to review reactions to template updates first though, if there are any. |
|||
I'd also have to take a look at the template proposed to hide these links. The cases in [[User:Dw31415/ArchiveEdits1]] seem needlessly complicated. Compare say the current proposed |
|||
*<code><nowiki>{{Deprecated archive|sourceurl=http://archives.dailynews.lk/2003/10/18/fea05.html|title=Establishing Pāli Text Society for Buddhist literature|archivehostpath=archive. today/20131217001046/http://archives.dailynews.lk/2003/10/18/fea05.html}}</nowiki></code> |
|||
with say {{tl|DAL}} for deprecated archive link (or some short variant) |
|||
*<code><nowiki>"{{DAL|https:// archive. today/20131217001046/http://archives.dailynews.lk/2003/10/18/fea05.html|Establishing Pāli Text Society for Buddhist literature}}</nowiki></code> |
|||
with functionality that recognises the base url vs host path and archive date automatically. |
|||
 <span style="font-variant:small-caps; whitespace:nowrap;">[[User:Headbomb|Headbomb]] {[[User talk:Headbomb|t]] · [[Special:Contributions/Headbomb|c]] · [[WP:PHYS|p]] · [[WP:WBOOKS|b]]}</span> 03:13, 20 June 2026 (UTC) |
|||
{{reftalk}} |
{{reftalk}} |
||
Revision as of 03:13, 20 June 2026
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: Dw31415 (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 11:56, Sunday, June 14, 2026 (UTC)
Function overview: Replace Archive Today links with the original source link when possible and not already hidden by CS1 templates.
Automatic, Supervised, or Manual: Automatic
Programming language(s): Python, Pywikibot
Source code available: https://gitlab.wikimedia.org/dw31415/cutlass-bot
Links to relevant discussions (where appropriate): Wikipedia talk:archive.today guidance#Wrap standalone, blacklisted link
Edit period(s): Continuous
Estimated number of pages affected: 88,385
Namespace(s): Mainspace/Articles
Exclusion compliant (Yes/No): Yes
Function details:
- Use quarry to identify external deprecated archive links in the 0 namespace
- Filter to paths containing a link http(s)://{hostname}. Extract the source url (note: no plans to test if the link is live, see discussion)
- Find the context of the link in the page
- Filter to links in [] (not in a template)
- Replace the link (see discussion, replacement details under discussion.
Note: See dry run at User:Dw31415/ArchiveEdits1
Discussion
Background and Proposal: In February, the WP:NOMOREARCHIVETODAY RfC reached a consensus to “remove” all links to archive today. Since then good efforts have been made to replace the links or hide them when contained in templates. However, more than 100,000 links remain visible.
This week, Wikipedia editors documented instances in which archive.today links redirected readers to the Tehran Times rather than the expected archived content[1]. The behavior was captured on video and discussed at Wikipedia talk:Archive.today guidance. This new behavior reduces the utility of the remaining links and demonstrates that readers cannot reliably predict where these links will lead.
I intend this bot to implement the existing community consensus by replacing archive.today links with their original source URLs when those URLs can be identified. I do not intend the bot to evaluate the continued availability of the original source material. Rather, it will restore the target selected by the original editor while removing links to a service that the community has already determined should no longer be presented to readers.
I currently operate DwAlphaBot, but propose this task should be conducted by a new bot to improve traceability and distinguish these edits from DwAlphaBot’s other approved tasks. The code is not yet complete, and discussion is ongoing regarding implementation details, including whether any hidden metadata should be preserved[2].
I am seeking early review and guidance from BAG and interested editors and request approval for an initial trial of up to 20 edits involving only deterministic replacements where the original URL is reviewed by me. Dw31415 (talk) 13:50, 14 June 2026 (UTC)
- I oppose any bot replacing these links without checking for their availability at the original URL and without checking their availability at Wayback Machine/Ghostarchive/Megalodon. sapphaline (talk) 14:53, 14 June 2026 (UTC)
- Thank you for considering it. Do you oppose CS1 hiding the archive links? That’s by far the greater number (~600k) links. I’m trying to gain support for a similar approach. Are there any mitigations that would win your support? Dw31415 (talk) 15:17, 14 June 2026 (UTC)
- "Do you oppose CS1 hiding the archive links?" - no, because this creates a backlog of links that need replacement. Your approach essentially means offloading one big backlog (visible archive.today links) to a different big backlog (dead links), which is even bigger and has even less people interested in cleaning it up. sapphaline (talk) 16:21, 14 June 2026 (UTC)
- "Are there any mitigations that would win your support?" - checking the availability of the original URL and assigning appropriate
|url-status=to the citation is a bare minimum; ideally the bot should also check the mentioned archives and add an archived copy in case it's not a redirect (3xx) or an error page (4xx/5xx). If this is implemented, then the bot should also add a hidden category for every affected page so that editors can check after the bot and replace inappropriate archives added by it. sapphaline (talk) 16:43, 14 June 2026 (UTC)- Nice idea for the category. I’ve added one to the template (link at other discussion). I need to check how to make it hidden Dw31415 (talk) 17:20, 14 June 2026 (UTC)
- It would be easy to check the way back api and mark if one exists. Harder to do more than that. Dw31415 (talk) 17:21, 14 June 2026 (UTC)
- Thank you for considering it. Do you oppose CS1 hiding the archive links? That’s by far the greater number (~600k) links. I’m trying to gain support for a similar approach. Are there any mitigations that would win your support? Dw31415 (talk) 15:17, 14 June 2026 (UTC)
- (edit conflict) I oppose unless the content is available at the original URI (simply not being a 404 is not enough - check for usurpations, soft 404s, domain reselling pages, etc). in other situations it should be replaced with a working, non-deprecated archive or marked as a dead link with a comment noting that an archive.today link was removed with a link to why. If the checking cannot be done reliably by a bot then it is not a task suitable for a bot. Thryduulf (talk) 16:50, 14 June 2026 (UTC)
- Please say more about “marked as a dead link with a comment”. The Mauer PDF in the linked discussion is a good example is a good example. The source url returns some minimal text (not a 404). I haven’t checked way back for it yet. Dw31415 (talk) 16:59, 14 June 2026 (UTC)
- By marking as a deadlink with a note, I mean cases that where if the archive.today copy didn't exist it would be tagged using {{dead link}} (or similar) but leaving a hidden comment that the AT copy does exist if anyone wants to view it (doing so may enable them to find the information elsewhere for example). Thryduulf (talk) 22:02, 14 June 2026 (UTC)
- It looks like the hidden comment part is there for all links - the template currently renders in wikicode like this according to it's documentation (one space added to defeat edit filter), which includes the information about the archive.today link. Tazerdadog (talk) 22:49, 14 June 2026 (UTC)
- Thanks! I’ll fix tonight. Dw31415 (talk) 23:23, 14 June 2026 (UTC)
- It looks like the hidden comment part is there for all links - the template currently renders in wikicode like this according to it's documentation (one space added to defeat edit filter), which includes the information about the archive.today link. Tazerdadog (talk) 22:49, 14 June 2026 (UTC)
- By marking as a deadlink with a note, I mean cases that where if the archive.today copy didn't exist it would be tagged using {{dead link}} (or similar) but leaving a hidden comment that the AT copy does exist if anyone wants to view it (doing so may enable them to find the information elsewhere for example). Thryduulf (talk) 22:02, 14 June 2026 (UTC)
- @Thryduulf, Have you experience the redirect to Tehran Times? If not, I'd ask you to try it. I find it unsettling. You might try the "tally-ho" archive at Frank_Frazetta. I tried to reproduce, but my home ISP actually blocks archive today. I get a connection refused error. Please try it and let us know if the Tehran Times redirect still reproduces. Dw31415 (talk) 12:45, 15 June 2026 (UTC)
- I have not personally experienced that behaviour, but I don't understand the relevance to my objection. My objection is to removing an AT link without one of (a) a replacement archive of the content, (b) a working link to the content, or (c) marking the link as dead with a note that the the AT archive exists (but is not suitable for reasons explained at a linked page). {{Deprecated archive}} matches (c) only if no other archive or live copy of the content exists. Thryduulf (talk) 13:00, 15 June 2026 (UTC)
- @Thryduulf, thanks for calling me back to your objection. My concern is that the conditions you outline are difficult for a bot to evaluate reliably at scale.
- However, I do not think it is appropriate to hold this bot task to a different standard than the one already applied through the CS1 implementation. The community has already accepted an approach in which links are suppressed from reader view based solely on the presence of an archive.today URL. That implementation did not depend on establishing that the original URL was live, that another archive existed, or that the archive.today snapshot was not uniquely valuable.
- If the community’s position is that those evaluations are required before a hidden archive.today link may be removed from view, then it follows that the same evaluations should have been required before those links were hidden from readers in the first place. I do not believe BAG should create a higher threshold for these links than the threshold that was used to hide them with CS1.
- Am I missing something about how this proposal compares to the CS1 implementation or the consensus from the RfC? Dw31415 (talk) 19:59, 15 June 2026 (UTC)
- if the conditions cannot be reliably evaluated by a bot then this is not a task that is appropriate for a bot to perform. Thryduulf (talk) 21:06, 15 June 2026 (UTC)
- It would seem that you find the CS1 implementation to be objectionable as well, is this correct? fifteen thousand two hundred twenty four (talk) 21:12, 15 June 2026 (UTC)
- If that is also being applied without meeting the above necessary conditions then yes, but my understanding is that that is not changing the wikitext and so does not harm the encyclopaedia in the same way a bot will Thryduulf (talk) 07:32, 16 June 2026 (UTC)
- It was applied 22 February 2026 without meeting your desired criteria, but it did meet the RFC consensus that the links be removed
as soon as practicable
(though if we were to split hairs, hiding isn't removal). fifteen thousand two hundred twenty four (talk) 08:57, 16 June 2026 (UTC)- Just because we have previously been reckless with the encyclopaedia previously is not an acceptable justifcation for doing something more reckless (at best) again. The RFC did not give editors carte blanche to harm the encyclopaedia in order to achieve a goal motivated by a moral panic rather than rational thought. Thryduulf (talk) 09:19, 16 June 2026 (UTC)
- I have no desire to relitigate the RFC, which it's appearing more and more like that's what this is. The consensus there found that directing readers to an archive that hijacks connections to perform attacks and modifies its contents to target certain persons was harmful, and that links to said archive should be removed asap. Any claim that this rational finding was motivated by moral panic is one I can't take seriously. I'll be focusing my attention elsewhere now. fifteen thousand two hundred twenty four (talk) 09:42, 16 June 2026 (UTC)
- Just because we have previously been reckless with the encyclopaedia previously is not an acceptable justifcation for doing something more reckless (at best) again. The RFC did not give editors carte blanche to harm the encyclopaedia in order to achieve a goal motivated by a moral panic rather than rational thought. Thryduulf (talk) 09:19, 16 June 2026 (UTC)
- CS1 does not change the wiki text. It removes the archive from displaying at all at render time. I’ll try to get a before and after. Dw31415 (talk) 14:32, 16 June 2026 (UTC)
- It was applied 22 February 2026 without meeting your desired criteria, but it did meet the RFC consensus that the links be removed
- If that is also being applied without meeting the above necessary conditions then yes, but my understanding is that that is not changing the wikitext and so does not harm the encyclopaedia in the same way a bot will Thryduulf (talk) 07:32, 16 June 2026 (UTC)
- It would seem that you find the CS1 implementation to be objectionable as well, is this correct? fifteen thousand two hundred twenty four (talk) 21:12, 15 June 2026 (UTC)
- if the conditions cannot be reliably evaluated by a bot then this is not a task that is appropriate for a bot to perform. Thryduulf (talk) 21:06, 15 June 2026 (UTC)
- Please say more about “marked as a dead link with a comment”. The Mauer PDF in the linked discussion is a good example is a good example. The source url returns some minimal text (not a 404). I haven’t checked way back for it yet. Dw31415 (talk) 16:59, 14 June 2026 (UTC)
{{Deprecated archive
|sourceurl=https://example.com/source-page
|title=Source page
|archivehostpath=archive .ph/YYYYMMDD/https://example.com/source-page
}}
- Should I ping respondents to Wikipedia talk:Archive.today guidance#Wrap standalone, blacklisted link or should we keep support/oppose there? Dw31415 (talk) 15:12, 14 June 2026 (UTC)
- We already had a full consensus discussion to do this - the RFC that had consensus to remove all archive.today links was exceptionally well attended. It closed with a consensus to go much further than this bot would, and remove every archive.today link, regardless of any hole that would be left. Since that discussion, 2 big things have happened, both of which indicate we should go ahead with this bot expeditiously. The first is the changes to the CS1 template, which hid the majority of the archive.today links in the same sense that this bot would. This proceeded with minimal controversy relative to the size of the change. The second change is the random linking to the Tehran Times when the referrer is Wikipedia. That degrades the utility of the archive, and makes it an unreliable link for our readers in the sense that it doesn't go where it promises it does. Requiring this bot to jump through excessive hoops to check for repairing the dead link is counterproductive when we need to action these removals in a timely manner. I can get on board with implementing whatever Dw31415 can implement quickly. If any of these checks on repairing the link are technically difficult or time consuming, we need to proceed without them, and invite the objectors to come in behind the bot and do them in a second pass. Tazerdadog (talk) 18:18, 14 June 2026 (UTC)
- Thanks. Just to underscore, the flexibility of the Template:Deprecated archive. It’s designed so the community could decide to reverse the decision and display the deprecated links again just by updating the template. (No need to touch the pages again). Dw31415 (talk) 18:51, 14 June 2026 (UTC)
- I spent about 90 minutes working out the plan for the work queue (https://gitlab.wikimedia.org/dw31415/cutlass-bot/-/blob/main/queue-implementation-plan.md?ref_type=heads). I hope to get some guidance soon from BAG on next steps. I'll be away for Monday & Tuesday. I'll be able to respond by phone but not able to work on the bot. Dw31415 (talk) 03:57, 15 June 2026 (UTC)
- This is my understanding of the RFC as well, WP:NOMOREATODAY closed asking that
as soon as practicable
weremove all links to it
, which is a rather strong result when considering that none of the initially proposed options solely concerned removal (Option A was removal/hiding). I see no qualifiers there that links should be removed, but only after the original site is determined to be live, just that they should be removed as soon as it's feasible. With a bot it's feasible, and the proposed approach using {{deprecated archive}} is highly reasonable, essentially mirroring the CS1 hiding that is already widely deployed without issue. When it comes to actioning the existing consensus I see no reason to oppose the proposal here. fifteen thousand two hundred twenty four (talk) 20:12, 15 June 2026 (UTC) - @Tazerdadog: I went through WP:NOMOREARCHIVETODAY and see nothing that supports mass removal by bot in the manner you suggest. There is strong consensus to deprecate. There is also strong consensus to get rid of these links as soon as practically feasible, in the sense that url removal must be minimally disruptive and preferably replaced by an alternative.
- If what is desired is the hiding of these links from readers, {{cite xxx}} templates can be updated to hide archive.today links and put them in a maintenance category. Same from {{webarchive}}. Once that's done, we can look at bots wrapping raw urls in a similar fashion. Headbomb {t · c · p · b} 16:46, 16 June 2026 (UTC)
- Pinging Voorts into this conversation - as the closer he's better positioned than I am to comment on the closure. Are there a significant number of archive.today links still visible to readers that are wrapped in a cite x or webarchivetemplate rather than as bare links? If that's the case then I absolutely agree we should fix those and then circle back to this discussion afterwards. I thought the CS1 change had taken care of them, but I could easily be wrong. Tazerdadog (talk) 16:58, 16 June 2026 (UTC)
- Serves me right for trying to use the visual reply tool for anything - @Voorts: Tazerdadog (talk) 17:00, 16 June 2026 (UTC)
- What's the question? voorts (talk/contributions) 17:43, 16 June 2026 (UTC)
- Trying to phrase carefully so I don't put words in someone's mouth:
- Does your closure at the Archive Today RFC imply community consensus in favor of using a bot to address the archive.today links assuming that such a bot is the only known way to address the links in a timely manner?
- Are there any checks that the bot should perform while wrapping the link, such as checking for a live link, checking for an alternative archive, or marking links as dead that should be performed while the bot runs to remain consistent with the community's consensus? Are there any that it must perform, even if it slows the development of the bot and the eventual addressing of the links?
- Is the fact that we're implementing a half measure by hiding the archive link from readers while retaining it in plaintext a fatal issue with complying with the close, given that we have been unable to get a solution to fully remove the links moving? Tazerdadog (talk) 18:05, 16 June 2026 (UTC)
- The close deprecated archive.today and said all links should be removed, not just hidden. There was no consensus in the discussion to merely hide the links. The close did not speak to whether we should use a bot, but I don't see why that would be objectionable. voorts (talk/contributions) 18:20, 16 June 2026 (UTC)
- What's the question? voorts (talk/contributions) 17:43, 16 June 2026 (UTC)
- Serves me right for trying to use the visual reply tool for anything - @Voorts: Tazerdadog (talk) 17:00, 16 June 2026 (UTC)
- Pinging Voorts into this conversation - as the closer he's better positioned than I am to comment on the closure. Are there a significant number of archive.today links still visible to readers that are wrapped in a cite x or webarchivetemplate rather than as bare links? If that's the case then I absolutely agree we should fix those and then circle back to this discussion afterwards. I thought the CS1 change had taken care of them, but I could easily be wrong. Tazerdadog (talk) 16:58, 16 June 2026 (UTC)
- IIRC, the question of hiding vs. removing was addressed in the RfC. voorts (talk/contributions) 18:27, 16 June 2026 (UTC)
- Thank you for the quick answer Voorts. @Headbomb: - is this sufficient to demonstrate community consensus for the bot task, or should we start additional discussions to establish it? @Dw31415: - can the bot be modified so that the link is removed instead of simply placed in the Wikitext, ideally in a way that is reversible or that allows other editors/bots to follow behind yours and check whether a different archive matches the archive today citation that was removed? (this could be as simple as a table of citations and archive today links off in wikispace somewhere) Tazerdadog (talk) 18:41, 16 June 2026 (UTC)
- @Voorts: Yes, and while everyone agrees that removal is the ultimate goal, that doesn't mean it is the first step. Headbomb {t · c · p · b} 18:41, 16 June 2026 (UTC)
- The consensus was to deprecate and remove all the links as soon as possible voorts (talk/contributions) 18:43, 16 June 2026 (UTC)
- "As soon as practicable" is the wording of the close. That means not mass removed by bot, unless the community decides that it doesn't want to wait and does not want intermediate steps done (like a bot run to find alternative archives). Headbomb {t · c · p · b} 18:45, 16 June 2026 (UTC)
- That is some serious wikilawyering. voorts (talk/contributions) 18:46, 16 June 2026 (UTC)
- Bots require clear mandates. That RFC is not a clear mandate. "We ought to start running as soon as we're ready" does not mean "We ought to start running NOW", especially when we aren't ready, and that people haven't even decided what 'ready' looks like. Headbomb {t · c · p · b} 18:52, 16 June 2026 (UTC)
- That is some serious wikilawyering. voorts (talk/contributions) 18:46, 16 June 2026 (UTC)
- "As soon as practicable" is the wording of the close. That means not mass removed by bot, unless the community decides that it doesn't want to wait and does not want intermediate steps done (like a bot run to find alternative archives). Headbomb {t · c · p · b} 18:45, 16 June 2026 (UTC)
- The consensus was to deprecate and remove all the links as soon as possible voorts (talk/contributions) 18:43, 16 June 2026 (UTC)
- The RfC said we should remove the links as soon as possible. A bot would allow us to do that. What is your objection to a bot doing it? voorts (talk/contributions) 18:59, 16 June 2026 (UTC)
- "As soon as practicable" not "As soon as possible". Headbomb {t · c · p · b} 19:15, 16 June 2026 (UTC)
- That was about adding to the spam blacklist. I had originally said links should be removed "forthwith", but changed that after editors pointed out implementation would take some time. Opposing a bot to implement the RfC result is contrary to the consensus. voorts (talk/contributions) 16:33, 19 June 2026 (UTC)
- "As soon as practicable" not "As soon as possible". Headbomb {t · c · p · b} 19:15, 16 June 2026 (UTC)
- @Thryduulf, I'm not sure that I'm understanding your hopes here. Which of these two sentences is closest to your view?
- It is important that readers be able to click on links to this website, even though it is known to have behaved maliciously, to have redirected readers to a different website (which doesn't help anyone check whether the source supports the article content), and now appears to be threatening to delete all the pages that are linked in Wikipedia.
- Readers don't actually need to see these links, but we shouldn't remove the URL information from the wikitext, because that information might be helpful to editors when they manually review the links or are just editing the article in general (and, yes, it'd still be there in the page history, but realistically, nobody's going to find that).
- WhatamIdoing (talk) 17:00, 19 June 2026 (UTC)
- My view is closer to your 2, but it's not spot on:
- If the AT link verifies the content then that link should remain unless and until it is replaced by either a working link that verifies the content or an alternative archive verifies the content.
- If the AT link does not verify the content (including if the archive has been deleted) then it should be removed and replaced with, in preference order:
- A working live link that verifies the content
- An alternative archive
- An explicit indication that the source is dead
- If it is unknown whether the AT link verifies the content then it should remain until that is established. If a bot is unable to work within these parameters then it must not be approved.
- IMO everything else is an affront to WP:V, which I shouldn't have to remind editors is a non-negotiable core policy. Thryduulf (talk) 17:07, 19 June 2026 (UTC)
- I'm trying to understand what it means for the link to "remain". If I wrap the AT link inside a
<!-- hidden HTML comment -->, does it still "remain"? I think so: The link is still right there in the wikitext. But do you agree with me? WhatamIdoing (talk) 17:27, 19 June 2026 (UTC)- It sort of remains. It would be best if the link were to remain visible until its status was resolved, but being present but hidden is better than being removed. Thryduulf (talk) 18:08, 19 June 2026 (UTC)
- Can you live with being present in the wikitext and hidden from the reader? Would it make a big difference to you if it were instead present in the wikitext, visible to logged-in editors, and hidden from unsuspecting readers? WhatamIdoing (talk) 19:25, 19 June 2026 (UTC)
- I thought I answered your first question in my last comment? It's not ideal but I can live with it. I don't understand how your second question differs from the first? Thryduulf (talk) 19:37, 19 June 2026 (UTC)
- The keep/remove choice is a bit more of a spectrum than you might think, due to the magic of CSS. It goes something like this:
- Remove from wikitext; nobody sees the ATODAY link. (In all options, the title/author/date/original URL/rest of the citation would stay, of course.)
- Keep in wikitext, but <--hidden-->, so it can only be seen if you look in the wikitext.
- Keep in wikitext, but use CSS magic to <--hide it from readers--> while still keeping it visible to logged-in editors.
- Keep in wikitext; everybody sees it (=what we have today).
- You've said that #2 is okay and that #4 is okay. I wonder if you think #3 would be a material improvement compared to #2. (Upside: easier for editors to see, still protects most unsuspecting readers; downside: on a long/complex page, the CSS might be a little slow to process, so the link might sometimes be visible for a second or two and then disappear). WhatamIdoing (talk) 20:26, 19 June 2026 (UTC)
- Ah ok, I understand now. Basically the preference order is 4 (until assessed, i.e. not permanently), then 3 then 2. Thryduulf (talk) 22:32, 19 June 2026 (UTC)
- Okay. If we change
[http//www.example.com Title]to{{new template|link=http//www.example.com |label=Title}}, we should be able to set the template work as #2, #3, or #4 at different times. It would also make it easier for people to find the ones that need to be reviewed, because maintenance templates can have maintenance categories attached. But we'd need the bot to install the template before any of that could be done. The point of this request is to install the template. Can you agree to have the template installed? WhatamIdoing (talk) 23:52, 19 June 2026 (UTC)
- Okay. If we change
- Ah ok, I understand now. Basically the preference order is 4 (until assessed, i.e. not permanently), then 3 then 2. Thryduulf (talk) 22:32, 19 June 2026 (UTC)
- The keep/remove choice is a bit more of a spectrum than you might think, due to the magic of CSS. It goes something like this:
- I thought I answered your first question in my last comment? It's not ideal but I can live with it. I don't understand how your second question differs from the first? Thryduulf (talk) 19:37, 19 June 2026 (UTC)
- Can you live with being present in the wikitext and hidden from the reader? Would it make a big difference to you if it were instead present in the wikitext, visible to logged-in editors, and hidden from unsuspecting readers? WhatamIdoing (talk) 19:25, 19 June 2026 (UTC)
- It sort of remains. It would be best if the link were to remain visible until its status was resolved, but being present but hidden is better than being removed. Thryduulf (talk) 18:08, 19 June 2026 (UTC)
- I'm trying to understand what it means for the link to "remain". If I wrap the AT link inside a
- The question of whether it should be hidden or removed was posed in the RfC; my close found a consensus to remove rather than hide. voorts (talk/contributions) 20:32, 19 June 2026 (UTC)
- Yes, but it's possible to do the "hide" step now by bot, while we continue the "remove" process. Also, the "hide" step can be done with a template that has an associated with a maintenance category, which makes it easier for any "removers" to find them. I don't think that "hide" should be seen as the opposite of "remove"; it is instead a step that can help us reach the "remove" goal. WhatamIdoing (talk) 22:04, 19 June 2026 (UTC)
- My view is closer to your 2, but it's not spot on:
Dry run added: User:Dw31415/ArchiveEdits1 Dw31415 (talk) 12:21, 15 June 2026 (UTC)
- Note the first example there sets the title of the archived page to be "Archived" which is obviously incorrect. If the bot is going to add errors of this nature to the encyclopaedia then that is another reason to oppose. Thryduulf (talk) 07:36, 16 June 2026 (UTC)
- @Thryduulf, I agree the “archived” case should be examined more carefully. There are two other options I considered:
- Changing the link to an Interstitial webpage that gives a warning, a click through, and information about whether a link exists at the way back.
- Changing the link to a WP page that explains the situation and how to find the original archive link
- Do you find either of these less objectionable? Dw31415 (talk) 14:38, 16 June 2026 (UTC)
- p.s. here is a mock up of an interstitial https://dw31415wp-glitch.github.io/archive-checker-bot/?url=https://archive-today/2025.01.01-120000/https://example.org/article Dw31415 (talk) 16:16, 16 June 2026 (UTC)
- (edit conflict) If webpage is linked, directly or indirectly, from an article in any form other than a bare url then any metadata about that page (including its title) should be recorded (and displayed if the link is displayed) correctly. This applies regardless of whether the link is to a live webpage or to an archive, and if the latter what archive that is.
- Really every link processed by this bot should be left in one of four states:
- A link to a live copy of the content that supports the associated article text (with or without an archive, AT or otherwise)
- A link to an acceptable archive of the content that supports the associated article text
- An explicitly marked dead link with some sort of note that an archive exists at AT with some explanation why it isn't being linked (this can be inline, via a linked page or some combination)
- An explicitly marked permanently dead link (there is no benefit to even mentioning a broken AT archive)
- Thryduulf (talk) 16:18, 16 June 2026 (UTC)
- I would probably oppose those solutions - if we have the link, it should directly go to where we said it was going to go. If we're not willing to honor the link destination, which we are not for archive today, we should not have a clickable link. In any case, I'd like to see a high quality consensus discussion authorizing these before we seriously consider implementing them. Tazerdadog (talk) 18:26, 16 June 2026 (UTC)
- Any thoughts about how to have that discussion? I’m leaning to modifying the proposal so the template starts with the status quo behavior. This would allow the community to act more easily through the template (just like CS1). That might allow the bot folks to stay out of the consensus building business. Dw31415 (talk) 20:21, 16 June 2026 (UTC)
- @Thryduulf, I agree the “archived” case should be examined more carefully. There are two other options I considered:
- If I was proposing a way forward from here, I'd first make sure that this comment from @Headbomb: didn't get lost in the shuffle:
.If what is desired is the hiding of these links from readers, {{cite xxx}} templates can be updated to hide archive.today links and put them in a maintenance category. Same from {{webarchive}}. Once that's done, we can look at bots wrapping raw urls in a similar fashion.
— User:Headbomb- If we have any low hanging fruit contained in these templates, we should at least disable reader facing links while we have the followup conversation via a single edit to the templates.
- Following that, I'd I'd defer to the BAG. I think we've made the best case we currently have for an existing consensus with the closer of a very well attended RFC coming in to this BRFA discussion. If that's good enough, great. If it's not, I'd ask what we do need, then hold that discussion. I could see a case for needing a more recent consensus, for needing a consensus to do something specifically with a bot instead of just as soon as (possible/practicable), or a clarification on what the bot needs to do versus what can/should/technically must be left undone, or a clarification on the final desired state (removal with no easy way to undo it, removal with an undo button/database in the template, hiding it in the wikitext so a volunteer could repair but no layman would find it, hashing the citation so that if you don't know the trick we tell to citation repairers you can't find it, etc.) Tazerdadog (talk) 00:21, 17 June 2026 (UTC)
- Looking a little deeper, it looks like Chaotic Enby pushed changes to the webarchive template shortly after this started, and the cite web talkpage redirects to CS1, so that might be already done? Tazerdadog (talk) 00:29, 17 June 2026 (UTC)
- Yes, those changes to CS1 and webarchive have already hidden 600k-ish links. This bot proposal targets the remaining 100k-ish. Dw31415 (talk) 01:33, 17 June 2026 (UTC)
- Looking a little deeper, it looks like Chaotic Enby pushed changes to the webarchive template shortly after this started, and the cite web talkpage redirects to CS1, so that might be already done? Tazerdadog (talk) 00:29, 17 June 2026 (UTC)
- @Fifteen thousand two hundred twenty four, @Headbomb, @Sapphaline, @Thryduulf, @Voorts: Just FYI, these websites were added to the global spam blacklist yesterday. This affects all WMF wikis, not just the English Wikipedia. WhatamIdoing (talk) 02:17, 17 June 2026 (UTC)
- The websites have been added to our local whitelist, and the current edit filter is keeping everything out with a more customized error message and fewer side effects than a spam blacklist. For most practical purposes this should take us back to the status quo of the last 3 months. Tazerdadog (talk) 03:06, 17 June 2026 (UTC)
- @Headbomb, thank you for reviewing and for your work on BAG. To your comment:
If what is desired is the hiding of these links from readers...templates can be updated to hide...
- CS1 and webarchive already changed[3] to hide the archive today links from readers without any significant objection from the community. That change hid the majority (~600k) of archive today links. I understand your position that the Wikipedia:NOMOREATODAY RfC does not provide sufficient consensus for a bot to remove links. Please note that this proposal doesn't actually remove the archive today links. It wraps standalone archive today links in a new template. This bot would empower the community to more effectively implement the RfC through modification of the template. Is there any role for a bot based on the decision of the RfC and the consensus in the CS1 and webarchive moves? Thanks! Dw31415 (talk) 04:15, 17 June 2026 (UTC)
- Consensus already exists to do this. Do not allow this to get derailed. I consider it unethical to delay.—S Marshall T/C 16:03, 19 June 2026 (UTC)
- Consensus exists to remove the links "when practical", a practical solution requires not going at it like a bull in a china shop to the detriment of WP:V. Thryduulf (talk) 16:48, 19 June 2026 (UTC)
- That's absolutely incorrect. Consensus was to get rid of all archive.today links. voorts (talk/contributions) 17:47, 19 June 2026 (UTC)
- I don't think my close could have been any clearer that the community clearly doesn't want archive.today links around anymore. voorts (talk/contributions) 17:49, 19 June 2026 (UTC)
- I'm becoming increasingly concerned that your close was not actually neutral. You have certainly become a big advocate of a hardline interpretation of the result, which is not the hallmark of someone who has dispassionately evaluated the consensus, which was far more measured than "remove everything without any consideration for anything regardless of what anybody says". Thryduulf (talk) 18:05, 19 June 2026 (UTC)
- I'm not advocating for anything other than adherance to consensus, which was perhaps the clearest I've ever seen over the course of closing many discussions on Wikipedia. The fact that you disagree with it does not change the fact that an overwhelming consensus of editors wanted us to stop using archive.today and get rid of links to it. I frankly think trying to undermine consensus by making baseless procedural objections like this is disruptive. voorts (talk/contributions) 18:23, 19 June 2026 (UTC)
- I'm not trying to be disruptive, rather I'm trying to minimise the disruption to the encyclopaedia caused by an overzealous interpretation of an RFC where the consensus (when you read the actual reasoned arguments not just the hyperbole) was absolutely not in favour of disrupting the encyclopaedia to make a point. Thryduulf (talk) 18:27, 19 June 2026 (UTC)
- If you thought my interpretation of the RfC was incorrect or
overzealous
, you should have come to my talk page when I closed it and then opened a close review if I didn't agree with your objections, instead of trying to undermine it months later. My close says what it says, and it says there was consensus to "remove" (not hide; which was expressly an option in the RfC) links to archive.today. voorts (talk/contributions) 18:31, 19 June 2026 (UTC)- At the time I thought your close was on the poor side but not by enough to challenge it, especially when emotions were running so hot. However it is your actions since the close that have gradually made me more and more uncomfortable, particularly the increasingly strident advocating for removal without regard for consequences which is at odds with the remove when practical wording from your own close. Thryduulf (talk) 18:36, 19 June 2026 (UTC)
- "Consensus was to get rid of all archive.today links" Yes, eventually. Not immediately, but rather "when practicable". Disruption is highly likely to occur and any bot that wants to plow through approval to remove all links as quickly as possible will be denied as ill thought out and running against consensus of the RFC. Headbomb {t · c · p · b} 18:41, 19 June 2026 (UTC)
- My actions since the close have been to discuss the close at this one thread when I was asked to weigh in. As the closer, it is not my place to consider the
consequences
ofremoval
. The community weighed those consequences, and at the time of my close, I found a consensus, based onactual reasoned arguments not just the hyperbole
toimmediately deprecate archive.today
, add it to the spam blacklistas soon as practicable
, and toremove all links
to archive.today. voorts (talk/contributions) 18:41, 19 June 2026 (UTC)
- At the time I thought your close was on the poor side but not by enough to challenge it, especially when emotions were running so hot. However it is your actions since the close that have gradually made me more and more uncomfortable, particularly the increasingly strident advocating for removal without regard for consequences which is at odds with the remove when practical wording from your own close. Thryduulf (talk) 18:36, 19 June 2026 (UTC)
- If you thought my interpretation of the RfC was incorrect or
- I'm not trying to be disruptive, rather I'm trying to minimise the disruption to the encyclopaedia caused by an overzealous interpretation of an RFC where the consensus (when you read the actual reasoned arguments not just the hyperbole) was absolutely not in favour of disrupting the encyclopaedia to make a point. Thryduulf (talk) 18:27, 19 June 2026 (UTC)
- I'm not advocating for anything other than adherance to consensus, which was perhaps the clearest I've ever seen over the course of closing many discussions on Wikipedia. The fact that you disagree with it does not change the fact that an overwhelming consensus of editors wanted us to stop using archive.today and get rid of links to it. I frankly think trying to undermine consensus by making baseless procedural objections like this is disruptive. voorts (talk/contributions) 18:23, 19 June 2026 (UTC)
- I'm becoming increasingly concerned that your close was not actually neutral. You have certainly become a big advocate of a hardline interpretation of the result, which is not the hallmark of someone who has dispassionately evaluated the consensus, which was far more measured than "remove everything without any consideration for anything regardless of what anybody says". Thryduulf (talk) 18:05, 19 June 2026 (UTC)
- I don't think my close could have been any clearer that the community clearly doesn't want archive.today links around anymore. voorts (talk/contributions) 17:49, 19 June 2026 (UTC)
- That's absolutely incorrect. Consensus was to get rid of all archive.today links. voorts (talk/contributions) 17:47, 19 June 2026 (UTC)
- Frankly, I think you are not being dispassionate here. You are so against the removal of archive.today links that you are misreading community consensus to stop it from occurring. voorts (talk/contributions) 18:43, 19 June 2026 (UTC)
- I'm not against the removal of archive.today links. I'm against their removal without regard for the consequences of doing so, as I have explained in great detail multiple times because some editors (apparently including you) are unable or unwilling to understand that wholesale removal without regard to the consequences to the encyclopaedia was not something the RFC supported - not that an RFC, however well attended, could find a consensus to contravene WP:V.
- If the advocates of removal spent a fraction of the amount of energy on actually assessing the links so that there are no adverse consequences from removal as they do complaining that they aren't being allowed to harm the encyclopaedia as fast as they want to then there wouldn't be a need for discussions like thisn one. Thryduulf (talk) 18:54, 19 June 2026 (UTC)
- The WP:V objection that you're making was addressed and rejected in the RfC itself. If you want to relitigate my close, open a close review. If not, what are we even doing here? voorts (talk/contributions) 18:56, 19 June 2026 (UTC)
- The WP:V objection was not addressed in the RFC. What we are doing here is trying to ensure that any bot acts only in accordance with the bot policy, which in relevant part requires consensus for the actions to be taken and consensus for them to be taken by a bot. I'm not seeing either of those at this time. Thryduulf (talk) 19:01, 19 June 2026 (UTC)
- "Verif*" appears 141 times on the RfC page and that argument was directly addressed in my close:
Those in favor of maintaining the status quo rested their arguments primarily on the utility of archive.today for verifiability. However, an analysis of existing links has shown that most of its uses can be replaced.
voorts (talk/contributions) 19:05, 19 June 2026 (UTC)- Thryduulf is 100% correct here. You can either accept this and carry on, or I can close this bot request and deny it as premature, without prejudice against future bot requests once a clear consensus emerges. Headbomb {t · c · p · b} 19:06, 19 June 2026 (UTC)
- @Headbomb, while you're here, can you get your script to flag these? We need visibility, especially for medical articles. I've been surprised how often these links are associated with non-MEDRS sources (e.g., an "educational" page on a random dentist's website); in other cases (e.g., archives of the ICD codes), it's just unnecessary. WhatamIdoing (talk) 19:48, 19 June 2026 (UTC)
- Thryduulf is 100% correct here. You can either accept this and carry on, or I can close this bot request and deny it as premature, without prejudice against future bot requests once a clear consensus emerges. Headbomb {t · c · p · b} 19:06, 19 June 2026 (UTC)
- "Verif*" appears 141 times on the RfC page and that argument was directly addressed in my close:
- The WP:V objection was not addressed in the RFC. What we are doing here is trying to ensure that any bot acts only in accordance with the bot policy, which in relevant part requires consensus for the actions to be taken and consensus for them to be taken by a bot. I'm not seeing either of those at this time. Thryduulf (talk) 19:01, 19 June 2026 (UTC)
- This: If the advocates of removal spent a fraction of the amount of energy on actually assessing the links so that there are no adverse consequences from removal as they do complaining is wrong by at least three orders of magnitude. I've done some of that work. I have, in fact, spent more time on that work than I have spent in this discussion. But so far, I have solved ATODAY problems in fewer than 100 articles, and just for WPMED articles, I have more than 1,000 to go. WhatamIdoing (talk) 19:39, 19 June 2026 (UTC)
- The WP:V objection that you're making was addressed and rejected in the RfC itself. If you want to relitigate my close, open a close review. If not, what are we even doing here? voorts (talk/contributions) 18:56, 19 June 2026 (UTC)
- Thryduulf is correct even as he's making verifiably untrue claims about the RfC discussion? voorts (talk/contributions) 19:06, 19 June 2026 (UTC)
However, an analysis of existing links has shown that most of its uses can be replaced.
is very much not the same thing as "remove all the links as soon as possible, without regard for whether they have been and/or can be, and while doing so make it harder for this to be done later". Thryduulf (talk) 19:10, 19 June 2026 (UTC)- The first sentence of the close says to get rid of them. Those two sentences that I just quoted to you are addressing the arguments that were made in the RfC, arguments that you just asserted weren't actually made. I'm having a hard time AGF at this point if you're just going to cherry pick parts of the close and ignore parts of the discussion that are inconvenient to you. I've said what I have to say here at this point. voorts (talk/contributions) 19:13, 19 June 2026 (UTC)
- When you're reduced to claiming that shades of grey do not exist then it's very clear that no rational discussion is possible. Thryduulf (talk) 19:15, 19 June 2026 (UTC)
- I think you two should seriously consider disengaging from each other and from this discussion. I hope you will.—S Marshall T/C 20:44, 19 June 2026 (UTC)
- When you're reduced to claiming that shades of grey do not exist then it's very clear that no rational discussion is possible. Thryduulf (talk) 19:15, 19 June 2026 (UTC)
- The first sentence of the close says to get rid of them. Those two sentences that I just quoted to you are addressing the arguments that were made in the RfC, arguments that you just asserted weren't actually made. I'm having a hard time AGF at this point if you're just going to cherry pick parts of the close and ignore parts of the discussion that are inconvenient to you. I've said what I have to say here at this point. voorts (talk/contributions) 19:13, 19 June 2026 (UTC)
- @Headbomb, a couple of questions:
- What practicalities are stopping us from removing (or hiding) the links?
- Is it reasonable to ask for a BAG second opinion?
- Dw31415 (talk) 01:12, 20 June 2026 (UTC)
- Regarding practicality, I understand that something being practical means that something is possible and we have the resources to do it. Changing the template code in the Template:Cite web. made it very practical to hide the links. Using a bot makes it more practical to hide the standalone links. In your interpretation, is finding a replacement source one of the practicalities? Dw31415 (talk) 01:19, 20 June 2026 (UTC)
- Regarding a second opinion, normally I'd say by all means let's have more discussion and consensus building. In this case, we had one of the best attended RfC's in recent memory. We have the closer here plainly saying that the community decided on removal as soon as possible (and that trying to read in additional requirements to "practical" is engaging in wiki-lawyering). I'm reluctant to open an additional RfC that will essentially be asking if "remove" means "remove". The many respondents who share the consensus view would understandable be upset at needing to duplicate the arguments they made in the February RfC. As a better process, may I suggest that you approve the requested 20 edit trial run and we use that as a stimulus for more discussion. As an additional option, maybe the editors here saying that the closer didn't mean what the closer is still saying, should request a close review. Dw31415 (talk) 01:39, 20 June 2026 (UTC)
- Let me just add a word of appreciation for the spirited discussion. I trust that it comes from a place of dedication to improving the encyclopedia. My personal motivation matches the sentiments from WMF:
For readers to remain relaxed and trusting while using Wikipedia, they should be able to reasonably expect that links on Wikipedia to potentially dangerous websites are rare, and that those that do exist are dealt with quickly once spotted[4]
- As for time, I've spent way too much time trying to find compromise solutions. That time would be better spent helping to develop tools to replace the references with better alternatives. Dw31415 (talk) 01:49, 20 June 2026 (UTC)
- Eric at WMF responded to a message I sent asking if they had capabilities or plans that would duplicate this effort. Here is his reply:
Dw31415 (talk) 02:05, 20 June 2026 (UTC)We [at the WMF] don't have any plans or capabilities that this [bot] effort would conflict with or duplicate. Right now, we're focused on supporting volunteer-led actions. I haven't reviewed the details of your bot, and would defer to the community to do that. But if your initiative gets some steam, we'd love to stay tied in and see if there are ways we can help it from our end[5]
- @Dw31415:
- 1. What practicalities are stopping us from removing (or hiding) the links?
- Right now the main blocker is that it is unclear what the community wants. Yes, it wants the removal of those links, but it does not want wanton reckless action. Since the RFC never bothered to ask what "practicable" means, we're left guessing.
- Right now, CS1 templates are not hiding links but are marking them as a deprecated archive service and putting them in a maintenance category. {{webarchive}} templates have recently been updated to hide links and put them in a maintenance category. At very minimum, I can see support for taking bare links and marking them with a template, since that is what is currently being done elsewhere, but I don't know what the behaviour of that template should be concerning hiding or not. This is something for the community to decide, but it is not a decision we can make here. So as far as I'm concerned, the community needs to figure out what behaviour is currently desired {{cite xxx}} which keeps links and marks them as deprecated, or {{webarchive}} which hides links and marks them as deprecated. Once that is clear, templates can be updated and this one deployed.
- 2. Is it reasonable to ask for a BAG second opinion?
- Sure. Everyone's always free to ask for a second opinion. I've asked @Earwig: for his two cents earlier, but he may be busy, especially given the length of the discussion here.
- Headbomb {t · c · p · b} 02:16, 20 June 2026 (UTC)
- @Headbomb, Regarding CS1 links, I understand that the module is hiding them like at Jeremy Lin#cite ref-17 (note there is no "Archived" link in the citation to render the link to [...]today/20120630151832/http://www.denverpost.com/commented/ci_16724722). I think this was the leading cause of the reduction of 600k external links. Am I missing something? Dw31415 (talk) 02:43, 20 June 2026 (UTC)
- Consensus exists to remove the links "when practical", a practical solution requires not going at it like a bull in a china shop to the detriment of WP:V. Thryduulf (talk) 16:48, 19 June 2026 (UTC)
My bad, I think I was looking at a hardcoded instance of an example somewhere, rather than a live version and that gave me an innacurate picture of the current situation. Since CS1 and webarchive templates both hide links, it seems reasonable to approve a task that would also hide the remaining bare links. I'd have to review reactions to template updates first though, if there are any.
I'd also have to take a look at the template proposed to hide these links. The cases in User:Dw31415/ArchiveEdits1 seem needlessly complicated. Compare say the current proposed
{{Deprecated archive|sourceurl=http://archives.dailynews.lk/2003/10/18/fea05.html|title=Establishing Pāli Text Society for Buddhist literature|archivehostpath=archive. today/20131217001046/http://archives.dailynews.lk/2003/10/18/fea05.html}}
with say {{DAL}} for deprecated archive link (or some short variant)
"{{DAL|https:// archive. today/20131217001046/http://archives.dailynews.lk/2003/10/18/fea05.html|Establishing Pāli Text Society for Buddhist literature}}
with functionality that recognises the base url vs host path and archive date automatically. Headbomb {t · c · p · b} 03:13, 20 June 2026 (UTC)
References
- ^ Wikipedia talk:Archive.today guidance#c-Iam-py-test-20260612142000-Dw31415-20260610112000
- ^ Wikipedia talk:Archive.today guidance#Wrap standalone, blacklisted link
- ^ https://en.wikipedia.org/w/index.php?title=Module:Citation/CS1/sandbox&diff=prev&oldid=1339539206
- ^ https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment/Archive.is_RFC_5#c-EMill-WMF-20260210011600-WMF_note
- ^ https://en.wikipedia.org/wiki/User_talk:EMill-WMF#c-EMill-WMF-20260620014300-Dw31415-20260617173200