Jump to content

Wikipedia:Bots/Noticeboard

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Cyberpower678 (talk | contribs) at 23:40, 29 January 2018 (Need someone with a mass rollback script now.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

    Bots noticeboard

    Here we coordinate and discuss Wikipedia issues related to bots and other programs interacting with the MediaWiki software. Bot operators are the main users of this noticeboard, but even if you are not one, your comments will be welcome. Just make sure you are aware about our bot policy and know where to post your issue.

    Do not post here if you came to



    Bot causing multi colon escape lint error

    There are now 8,218 lint errors of type Multi colon escape, and all but 7 of these are caused by WP 1.0 bot. This bug was reported at Wikipedia talk:Version 1.0 Editorial Team/Index#Bot adding double colons 9 October 2017. Perhaps some bot experts who don't typically wander in those parts can apply their skills to the problem. Please continue the discussion there, not here. —Anomalocaris (talk) 06:14, 28 November 2017 (UTC)[reply]

    Didn't Nihlus already deal with all of these? Primefac (talk) 13:08, 28 November 2017 (UTC)[reply]
    NilhusBOT 5 is a monthly task to fix the problems with the 1.0 bot until such time as the 1.0 bot is fixed. --Izno (talk) 14:16, 28 November 2017 (UTC)[reply]
    Correct. I've been traveling lately, so I wasn't able to run it. I am running it now and will let you know when it is done. Nihlus 14:33, 28 November 2017 (UTC)[reply]
    Basically, I don't have the time but would need to myself properly fix the log today. It's simpler to just revert and fix it properly when I can. —PaleoNeonate15:51, 28 November 2017 (UTC)[reply]
    @PaleoNeonate: Why are you having this discussion in two separate places? I addressed the issue on my talk page. Nihlus 15:55, 28 November 2017 (UTC)[reply]

    I thought that this may be a more appropriate place considering that it's about 1.0 bot and Nihlusbot, so will resume it here. Your answer did not address the problem. Do you understand that:

    • Before October 5, 2017, category links were fine, but then later were broken, resulting in the same kind of bogus double-colon links as for drafts (these were not mainspace links, but Category: space links)
    • It's possible that draft links were always broken, resulting in the same kind of broken double-colon links
    • Nihlusbot causes both broken category and draft space links to become mainspace links (not Draft: or Category: ones as it should)
    • As a result, the "fix" does not improve the situation, the links are still broken (mainspace red links instead of category and draft links).
    • If keeping these changes and wanting to fix them later, it's more difficult to detect what links were not supposed to be to main space. In any case, to fix it properly, a more fancy script is needed which checks the class of the page...

    Thanks, —PaleoNeonate23:31, 28 November 2017 (UTC)[reply]

    Do I understand? Yes, yes, this time it did due to a small extra bit in the code, disagree as stated already, this is something I am working on. Thanks! Nihlus 00:27, 29 November 2017 (UTC)[reply]
    So, there are issues with almost every single namespace outside of articlespace, so WP 1.0 bot is making a lot of errors and should probably be prevented from continuing. However, until that time, I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. Nihlus 01:52, 29 November 2017 (UTC)[reply]
    A few hours ago, there were just 6 Multi colon escape lint errors. Now we have 125, all but 4 caused by WP 1.0 bot. This may be known to those working on the problem. —Anomalocaris (talk) 06:02, 29 November 2017 (UTC)[reply]
    @Nihlus: thanks for improving the situation. I see that Category links have been fixed (at least the ones I noticed). Unfortunately links to drafts remain to mainspace. —PaleoNeonate19:54, 29 November 2017 (UTC)[reply]
    @PaleoNeonate: As stated above: I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. Nihlus 19:55, 29 November 2017 (UTC)[reply]
    Yes I have read it, but unfortunately contest the value of such hackish edits in 1.0 logs. Perhaps at least don't just convert those to non-working mainspace links when the class is unavailable, marking them so they are known not to be in mainspace (those double-colon items never were in mainspace)? A marker, or even a non-linked title would be a good choice to keep the distinction... —PaleoNeonate20:48, 29 November 2017 (UTC)[reply]
    Again, I repeat: I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. That means those are the only fixes I am making with the bot going forward as I have no intention of supervising each edit made to discern whether something is a draft/project page or not. Nihlus 20:56, 29 November 2017 (UTC)[reply]
    I am limiting the corrections I am making to those that are explicitly assessed as Category/Template/Book/Draft/File-class. If they are classed incorrectly, then they will not get fixed. We appear to talk past eachother. That is not what technically happened. This diff (which you reverted) was made because links to mainspace were introduced for pages not in mainspace. If your script doesn't touch such links in the future when it cannot determine their class, that's an improvement. You say that you don't correct them, but so far they were still "fixed" (converted to erroneous mainspace links). The "loss of information" from my first complaint was about that those bogus links were previously unambiguously recognizable as non-mainspace (those that are now confusing, broken mainspace links when the class is not in the text). —PaleoNeonate05:27, 1 December 2017 (UTC)[reply]

    Commons Deletion Notification Bot

    I am developing a bot which notifies Wikipedia articles when images associated with them in Wikimedia Commons are

    1. nominated for deletion
    2. deleted
    3. Nominated not to be deleted

    How can I detect that an image is deleted from commons after nomination? Is there any APIs available for that? Harideepan (talk) 13:07, 2 January 2018 (UTC)[reply]

    @Harideepan: You should probably go talk to the Community Tech team, as they just had this topic place in their top ten wishes for 2017. See meta:Community Tech/Commons deletion notification bot. --Izno (talk) 14:13, 2 January 2018 (UTC)[reply]
    Thank you for your response. I am new here. Harideepan (talk) 14:34, 2 January 2018 (UTC)[reply]

    CommonsDelinker and Filedelinkerbot

    Filedelinkerbot was created to supplement CommonsDelinker, which was performing inadequately, with a lot of unaddressed bugs (including, off the top of my head, breaking templates and galleries) and limited maintenance. Is there any continued need for CommonsDelinker, that cannot be replaced by Filedelinkerbot? There are some issues which I'd like to raise, such as the removal of images from discussion archives (which should really be left as red links), and having single location to discuss such issues would really be preferable. --Paul_012 (talk) 03:46, 27 January 2018 (UTC)[reply]

    Slow-burn bot wars

    Moved from WP:ANI#Slow-burn bot wars Primefac (talk) 15:44, 27 January 2018 (UTC)

    Does anyone know why two bots edit war over which links to use for archived web refs? By way of example, the edit history of Diamonds Are Forever (novel) shows InternetArchiveBot and GreenC bot duking it out since September 2017. I've seen it on a couple of other articles too, but I can't be that bothered to dig them out. Although no real harm is done, it's mildly annoying when they keep cluttering up my watchlist. Cheers - SchroCat (talk) 14:17, 27 January 2018 (UTC)[reply]

    That would have to be resolved by the bot owners, probably at Wikipedia:Bots/Noticeboard. NinjaRobotPirate (talk) 14:35, 27 January 2018 (UTC)[reply]

    I added a {{cbignore}} (respected by both bots) until we figure it out. Notify us on our talk page or WP:BO is easiest. -- GreenC 15:40, 27 January 2018 (UTC)[reply]

    This appears to be an issue with GreenC bot. IABot is repairing the archive link and the URL fragment, and GreenC bot is removing it for some reason.—CYBERPOWER (Chat) 16:55, 27 January 2018 (UTC)[reply]
    GreenC bot gets the URL from the WebCite API as data authority - this is what WebCite says the archive is saved under. -- GreenC 17:35, 27 January 2018 (UTC)[reply]
    GreenC bot could use the |url= as data authority, but most of the time it is the other way around where the data in |url= is truncated and the data from WebCite is more complete. Example, example. So I went with WebCite as being more authoritative since that is how it's saved on their system. -- GreenC 17:47, 27 January 2018 (UTC)[reply]
    That's not the problem though. It's removing the fragment from the URL. It shouldn't be doing that.—CYBERPOWER (Chat) 18:15, 27 January 2018 (UTC)[reply]
    It's not removing the fragment. It's synchronizing the URL with how it was saved on WebCite. If the fragment is not there, it's because it was never there when captured at WebCite, or WebCite removed it during the capture. The data authority is WebCite. This turns out to be a good method as seen in the examples because often the URL in |url= field is missing information. -- GreenC 20:20, 27 January 2018 (UTC)[reply]
    I'm sorry, but that makes no sense. Why would WebCite, or any archiving service, save the fragment into the captured URL? The fragment is merely a pointer for the browser to go to a specific page anchor. IABot doesn't capture the fragments when reading URLs, but carries them through to archive URLs when adding them.—CYBERPOWER (Chat) 20:27, 27 January 2018 (UTC)[reply]
    Why is IABot carrying the fragment through into the archive URL? It's not used by the archive (except archive.is in certain cases where the '#' is a '%23'). -- GreenC 21:26, 27 January 2018 (UTC)[reply]
    Do you understand what the fragment is for? It's nothing a server ever needs to worry about, so it's just stripped on their end. It is a browser pointer. If the original URL had a fragment, attaching the same fragment to the archive URL makes sense so the browser goes straight to the relevant section of the page as it did in the original URL.—CYBERPOWER (Chat) 21:39, 27 January 2018 (UTC)[reply]
    Yeah I know what a fragment does (though was temporarily confused I forgot they worked at other services). But fragments don't work with WebCite URLs. We tack the "?url=.." on for RFC long-URL reasons but it is dropped when doing a replay (example). So there is no inherent reason to retain fragments at WebCite. However.. I can see the logic to keep them for some future purpose we can't guess at. And since it's already been done, by and large. So I will see about modifying GreenC bot to retain the fragment for WebCite (it already does for other services).
    There is the other problem as noted: IABot -> GreenCbot - any idea what might have caused it? -- GreenC 22:17, 27 January 2018 (UTC)[reply]
    Well even if it is dropped, which it should do, it still doesn't change the fact the page anchors exist. I'll give you an example of what I mean.—CYBERPOWER (Chat) 22:23, 27 January 2018 (UTC)[reply]
    The fragment is not the part after the ?, that is the query string. The fragment is the part after the #. --Redrose64 🌹 (talk) 22:24, 27 January 2018 (UTC)[reply]
    Yes I understand but it's different with WebCite URLs fragments don't work for reasons noted above. Try it: https://www.webcitation.org/5utpzxf0T?url=http://www.ymm.co.jp/p/detail.php?code=GTP01085336#song . Also on a different matter, what about this edit sequence? IABot -> GreenCbot -- GreenC 23:36, 27 January 2018 (UTC)[reply]
    GreenC bot is now carrying through the fragment in-line with IABot per above. -- GreenC 00:47, 28 January 2018 (UTC)[reply]
    Oh I see what you mean. The anchors don't actually work there, despite the fragment. In any event, IABot doesn't selectively remove them from WebCite URLs, as the fragment handling process happens during the archive adding process during the final stages of page analysis, when new strings are being generated to replace the old ones. I personally don't see the need to bloat the code to "fix" that, but then there's the question, what's causing the edit war?—CYBERPOWER (Chat) 00:52, 28 January 2018 (UTC)[reply]
    GreenC bot is fixed so it won't strip the fragment there shouldn't be any more edit wars over it, but there are probably other edit wars over other things we don't know about. Not sure how to find edit wars. -- GreenC 04:31, 28 January 2018 (UTC)[reply]
    Not sure how to find edit wars. Perhaps your bots could look at the previous edit to a page, and if it was made by its counterpart, log the edit somewhere for later analysis. It won't catch everything, and it might turn up false positives, but it's something. ​—DoRD (talk)​ 14:42, 28 January 2018 (UTC)[reply]
    GreenC bot targets pages previous edited by IABot so there always overlap. -- GreenC 15:01, 28 January 2018 (UTC)[reply]
    Maybe a pattern of the two previous edits being GreenC and IAbot? Galobtter (pingó mió) 15:09, 28 January 2018 (UTC)[reply]
    And/or the edit byte sizes being the same.. but it would take a program to trawl through 10s of thousands of articles and 100s of thousands of diffs it wouldn't be trivial to create. But a general bot-war detector would be useful to have for the community. -- GreenC 15:18, 28 January 2018 (UTC)[reply]

    Need someone with a mass rollback script now.

    Would someone who has a mass rollback script handy please revert InternetArchiveBot's edits going all the way back to the timestamp in this diff? Kind of urgent. IABot destroyed roughly a thousand articles, due to some communication failure with Wikipedia.—CYBERPOWER (Chat) 22:29, 29 January 2018 (UTC)[reply]

     Done Nihlus 23:07, 29 January 2018 (UTC)[reply]
    @Cyberpower678: When you say "destroyed", this means...? --Redrose64 🌹 (talk) 23:38, 29 January 2018 (UTC)[reply]
    It deleted chunks of articles or stuffed chunks of it into the references section, by making massive references out of them.—CYBERPOWER (Chat) 23:40, 29 January 2018 (UTC)[reply]