{"id":17384,"date":"2020-02-13T06:13:36","date_gmt":"2020-02-13T06:13:36","guid":{"rendered":"https:\/\/www.revoscience.com\/en\/?p=17384"},"modified":"2020-06-09T12:11:17","modified_gmt":"2020-06-09T12:11:17","slug":"automated-system-can-rewrite-outdated-sentences-in-wikipedia-articles","status":"publish","type":"post","link":"https:\/\/www.revoscience.com\/en\/automated-system-can-rewrite-outdated-sentences-in-wikipedia-articles\/","title":{"rendered":"Automated system can rewrite outdated sentences in Wikipedia articles"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"639\" height=\"426\" src=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg\" alt=\"\" class=\"wp-image-17385\" title=\"\" srcset=\"https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg 639w, https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0-300x200.jpg 300w\" sizes=\"auto, (max-width: 639px) 100vw, 639px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Text-generating tool pinpoints and replaces specific information in sentences while retaining humanlike grammar and style.<\/strong> <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A system created by MIT researchers could be used to automatically update factual inconsistencies in Wikipedia articles, reducing time and effort spent by human editors who now do the task manually.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Wikipedia comprises millions of articles that are in constant need of edits to reflect new information. That can involve article expansions, major rewrites, or more routine modifications such as updating numbers, dates, names, and locations. Currently, humans across the globe volunteer their time to make these edits.\u00a0\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In a paper being presented at the AAAI Conference on Artificial Intelligence, the researchers describe a text-generating system that pinpoints and replaces specific information in relevant Wikipedia sentences, while keeping the language similar to how humans write and edit.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The idea is that humans would type into an interface an unstructured sentence with updated information, without needing to worry about style or grammar. The system would then search Wikipedia, locate the appropriate page and outdated sentence, and rewrite it in a humanlike fashion. In the future, the researchers say, there\u2019s potential to build a fully automated system that identifies and uses the latest information from around the web to produce rewritten sentences in corresponding Wikipedia articles that reflect updated information.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cThere are so many updates constantly needed to Wikipedia articles. It would be beneficial to automatically modify exact portions of the articles, with little to no human intervention,\u201d says Darsh Shah, a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and one of the lead authors. \u201cInstead of hundreds of people working on modifying each Wikipedia article, then you\u2019ll only need a few, because the model is helping or doing it automatically. That offers dramatic improvements in efficiency.\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Many other bots exist that make automatic Wikipedia edits. Typically, those work on mitigating vandalism or dropping some narrowly defined information into predefined templates, Shah says. The researchers\u2019 model, he says, solves a harder artificial intelligence problem: Given a new piece of unstructured information, the model automatically modifies the sentence in a humanlike fashion. \u201cThe other [bot] tasks are more rule-based, while this is a task requiring reasoning over contradictory parts in two sentences and generating a coherent piece of text,\u201d he says.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The system can be used for other text-generating applications as well, says co-lead author and CSAIL graduate student Tal Schuster. In their paper, the researchers also used it to automatically synthesize sentences in a popular fact-checking dataset that helped reduce bias, without manually collecting additional data. \u201cThis way, the performance improves for automatic fact-verification models that train on the dataset for, say, fake news detection,\u201d Schuster says.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Shah and Schuster worked on the paper with their academic advisor Regina Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science and a professor in CSAIL.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Neutrality masking and fusing<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Behind the system is a fair bit of text-generating ingenuity in identifying contradictory information between, and then fusing together, two separate sentences. It takes as input an \u201coutdated\u201d sentence from a Wikipedia article, plus a separate \u201cclaim\u201d sentence that contains the updated and conflicting information. The system must automatically delete and keep specific words in the outdated sentence, based on information in the claim, to update facts but maintain style and grammar. That\u2019s an easy task for humans, but a novel one in machine learning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, say there\u2019s a required update to this sentence (in bold): \u201cFund A considers\u00a0<strong>28 of their 42<\/strong>\u00a0minority stakeholdings in operationally active companies to be of particular significance to the group.\u201d The claim sentence with updated information may read: \u201cFund A considers\u00a0<strong>23 of 43<\/strong>\u00a0minority stakeholdings significant.\u201d The system would locate the relevant Wikipedia text for \u201cFund A,\u201d based on the claim. It then automatically strips out the outdated numbers (28 and 42) and replaces them with the new numbers (23 and 43), while keeping the sentence exactly the same and grammatically correct. (In their work, the researchers ran the system on a dataset of specific Wikipedia sentences, not on all Wikipedia pages.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The system was trained on a popular dataset that contains pairs of sentences, in which one sentence is a claim and the other is a relevant Wikipedia sentence. Each pair is labeled in one of three ways: \u201cagree,\u201d meaning the sentences contain matching factual information; \u201cdisagree,\u201d meaning they contain contradictory information; or \u201cneutral,\u201d where there\u2019s not enough information for either label. The system must make all disagreeing pairs agree, by modifying the outdated sentence to match the claim. That requires using two separate models to produce the desired output.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first model is a fact-checking classifier \u2014 pretrained to label each sentence pair as \u201cagree,\u201d \u201cdisagree,\u201d or \u201cneutral\u201d \u2014 that focuses on disagreeing pairs. Running in conjunction with the classifier is a custom \u201cneutrality masker\u201d module that identifies which words in the outdated sentence contradict the claim. The module removes the minimal number of words required to \u201cmaximize neutrality\u201d \u2014 meaning the pair can be labeled as neutral. That\u2019s the starting point: While the sentences don\u2019t agree, they no longer contain obviously contradictory information. The module creates a binary \u201cmask\u201d over the outdated sentence, where a 0 gets placed over words that most likely require deleting, while a 1 goes on top of keepers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After masking, a novel two-encoder-decoder framework is used to generate the final output sentence. This model learns compressed representations of the claim and the outdated sentence. Working in conjunction, the two encoder-decoders fuse the dissimilar words from the claim, by sliding them into the spots left vacant by the deleted words (the ones covered with 0s) in the outdated sentence.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In one test, the model scored higher than all traditional methods, using a technique called \u201cSARI\u201d that measures how well machines delete, add, and keep words compared to the way humans modify sentences. They used a dataset with manually edited Wikipedia sentences, which the model hadn\u2019t seen before. Compared to several traditional text-generating methods, the new model was more accurate in making factual updates and its output more closely resembled human writing. In another test, crowdsourced humans scored the model (on a scale of 1 to 5) based on how well its output sentences contained factual updates and matched human grammar. The model achieved average scores of 4 in factual updates and 3.85 in matching grammar.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Removing bias<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The study also showed that the system can be used to augment datasets to eliminate bias when training detectors of \u201cfake news,\u201d a form of propaganda containing disinformation created to mislead readers in order to generate website views or steer public opinion. Some of these detectors train on datasets of agree-disagree sentence pairs to \u201clearn\u201d to verify a claim by matching it to given evidence.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In these pairs, the claim will either match certain information with a supporting \u201cevidence\u201d sentence from Wikipedia (agree) or it will be modified by humans to include information contradictory to the evidence sentence (disagree). The models are trained to flag claims with refuting evidence as \u201cfalse,\u201d which can be used to help identify fake news.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Unfortunately, such datasets currently come with unintended biases, Shah says: \u201cDuring training, models use some language of the human written claims as \u201cgive-away\u201d phrases to mark them as false, without relying much on the corresponding evidence sentence. This reduces the model\u2019s accuracy when evaluating real-world examples, as it does not perform fact-checking.\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The researchers used the same deletion and fusion techniques from their Wikipedia project to balance the disagree-agree pairs in the dataset and help mitigate the bias. For some \u201cdisagree\u201d pairs, they used the modified sentence\u2019s false information to regenerate a fake \u201cevidence\u201d supporting sentence. Some of the give-away phrases then exist in both the \u201cagree\u201d and \u201cdisagree\u201d sentences, which forces models to analyze more features. Using their augmented dataset, the researchers reduced the error rate of a popular fake-news detector by 13 percent.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cIf you have a bias in your dataset, and you\u2019re fooling your model into just looking at one sentence in a disagree pair to make predictions, your model will not survive the real world,\u201d Shah says. \u201cWe make models look at both sentences in all agree-disagree pairs.\u201d<\/p>\n  <br \/>","protected":false},"excerpt":{"rendered":"<p>Text-generating tool pinpoints and replaces specific information in sentences while retaining humanlike grammar and style. A system created by MIT researchers could be used to automatically update factual inconsistencies in Wikipedia articles, reducing time and effort spent by human editors who now do the task manually. Wikipedia comprises millions of articles that are in constant [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":17385,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"class_list":["post-17384","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research"],"featured_image_urls":{"full":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0-200x200.jpg",200,200,true],"medium":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0-300x200.jpg",300,200,true],"medium_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"1536x1536":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"2048x2048":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"ultp_layout_landscape_large":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"ultp_layout_landscape":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"ultp_layout_portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",600,400,false],"ultp_layout_square":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",600,400,false],"newspaper-x-single-post":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"newspaper-x-recent-post-big":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0-550x360.jpg",550,360,true],"newspaper-x-recent-post-list-image":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0-95x65.jpg",95,65,true],"web-stories-poster-portrait":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",639,426,false],"web-stories-publisher-logo":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",96,64,false],"web-stories-thumbnail":["https:\/\/www.revoscience.com\/en\/wp-content\/uploads\/2020\/02\/MIT-Wikipedia-Editing-01_0.jpg",150,100,false]},"author_info":{"info":["RevoScience"]},"category_info":"<a href=\"https:\/\/www.revoscience.com\/en\/category\/news\/research\/\" rel=\"category tag\">Research<\/a>","tag_info":"Research","comment_count":"0","_links":{"self":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/17384","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/comments?post=17384"}],"version-history":[{"count":0,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/posts\/17384\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media\/17385"}],"wp:attachment":[{"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/media?parent=17384"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/categories?post=17384"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.revoscience.com\/en\/wp-json\/wp\/v2\/tags?post=17384"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}