Did you know that some of the Wikipedia content is generated by smart bots? CHIP looked inside the Internet’s most popular library and figured out how it works.
The model of functioning of the Internet encyclopedia Wikipedia is very entertaining. Sections and articles here are compiled not by a group of famous scientists, but by volunteers who not only write, but also edit and check the content of other people’s articles. The effort and time spent just identifying and correcting unwanted entries and changes is enormous. What could be more logical than automating routine, constantly repetitive operations?
And here bots come to the rescue, making life easier for authors and preventing manifestations of vandalism. Currently, 350 such self-learning scripts are officially involved in the English-language Wikipedia, whose share in making changes on the platform now reaches 10%.
The ClueBot NG program, for example, relentlessly makes decisions about who and what can write in an encyclopedia. The program activates up to 700 times per hour, removing deliberate vandalism from texts and alerting trolls. ClueBot NG is controlled by a neural network whose self-learning algorithms work like a spam filter. The network adjusts the filtering rules, constantly receiving data about what the authors of Wikipedia regard as vandalism.
Chistle vs candid photos
Vandalism here refers to all kinds of destructive interventions, from the deliberate removal of fragments, insults, targeted actions that can affect the stock price, political PR, and ending with purely destructive behavior.
Often, photos of naked body parts with obscene comments may suddenly pop up in an article. These rather obvious points can be recognized and processed by bots. But they are not able to determine whether the photographs match the information contained in the piece of text – this is still up to the individual to decide.
The help of bots is invaluable: according to a study by Stuart Geiger and Aaron Halfaker from the universities of Berkeley and Minnesota, in the event of a failure, only ClueBot NG would have to do double the job of removing provocative content.
Four types of vandals on Wikipedia
Wikipedia is constantly subjected to various acts of vandalism. The ethicsofalgorithms.org website has a list that can help you distinguish rude or illiterate users from trolls and disruptors.
wits – trolls who, for the sake of jokes and fun, make meaningless changes and have fun from the heart.
Ignorant — editors who, because of their incompetence or stubbornness, repeatedly flout rules and traditions.
Saboteurs – people who want to specifically disrupt the work in the Wikipedia project.
Manipulators – PR people who spread false information – out of personal interests or on behalf of third parties.
Another reason for the growing importance of bots, along with the apparently large amount of information that needs to be processed, is the decrease in the number of active Wikipedia authors: if in the English version in 2007 50,000 people were actively involved in writing articles, then ten years later there were only 30,000.
However, assistants, who, in theory, should make the work of authors easier, can generally force them out. It’s about the speed of the scripts. As Halfaker and Geiger found, the effectiveness of the fight against vandalism has a negative effect on attracting new authors. Even if the number of contributors creating or editing an article for the first time did not change, the number of bot actions deleting such articles per second would increase significantly. As a result, the door would have slammed shut for the once-inspired novice Wikipedianist, and he would no longer return to the project.
Not a thunderstorm of authors, but assistants
With all the necessary efficiency, the project should not lose its attractiveness for new authors. New Wikipedians can get valuable advice from bots on how to use their abilities to the best advantage. The troll storm ClouBot NG, for example, is able to find articles that need to be improved and identify critical places.
Meanwhile, the triumphant march of bots on Wikipedia has led to the emergence of another phenomenon – edit wars. Uncompromising bot actions become a problem when one bot makes changes to the text and another undoes them. Taha Yasseri, a specialist at the Oxford Internet Institute, studied this behavior of bots as part of his research. Bot participants are too limited to break out of this cycle, he says: “Bot wars last much longer than human wars.”
When a bot meets a bot
A lot depends on where the bots are used. Yasseri cites self-driving cars as an example. These cars “will drive in different cultures and conditions, on the highways in Germany as well as on the roads in Italy,” but “the rules of the road are different everywhere, their own laws, driving culture.” Returning to Wikipedia, we are talking about the difference in versions of articles in individual countries, about what kind of editing culture prevails in the corresponding language community. “That’s why bots behave differently,” Yasseri concludes.
What’s more, in small language sections, bots can do the hard work of creating a range of article stubs. They turn to the Wikidata database, which contains a huge amount of information. From this material, bots can generate special texts, a kind of basis for a Wikipedia article.
In the future, entire versions of Wikipedia could be created entirely by bots. If the contribution of bots to the creation of articles in the most common languages fluctuates around 10%, then in small language sections this figure is already much higher: for example, the section of Cebuano, a language spoken in the Philippines, is 100% composed by bots. Meanwhile, formally, the Cebuan Wikipedia, which includes more than five million articles, is the second largest in the world.
The meaning of “Wikidata” seems to be more important outside of the project. In search of answers to questions in various areas of knowledge, Alexa or Google Home voice assistants are increasingly turning to Wikidata. Libraries and museums connect catalogs to them. Hence the conclusion of Wikipedia researcher Andrew Lee: “The role of Wikidata outside the Wikimedia platform is perhaps even more significant than inside it.”
- Artificial intelligence is already among us: when to expect the first attacks
- The story of a genius: John von Neumann – the progenitor of the PC
- What is UDIMM? Let’s talk about the types of RAM
Photo: manufacturing companies, ShutterStock/Fotodom.ru