Or Levi, Data Science Team Lead at eBay, shares further insights after his talk at WeAreDevelopers Congress Vienna 2019. Or Levi aims to help users find the items they are looking for and get inspired by ebay’s unique selection, whether it’s on the search results page, or the homepage feed, using personalized and relevant recommendations.

Or’s strongest passion is using AI for social impact, which has led him to found AdVerif.ai – with the purpose of fighting the spread of misinformation online. It was recently named among CB Insights’ 2019 International Game Changers – startups with potential to transform society and economies for the better

How can you ensure that good AI remains impartial and without a political bias of the AI creator?

Or Levi: People often tend to think that because AI is automated, it is objective and clear from biases, but essentially the machine learning model just picks up the biases in the training data fed into it. It is the responsibility of the model developer to address these issues in the training data. 

With AdVerif.ai, we are not making the judgement of what is fake news and what is not, but rather working in collaboration with fact checking organizations that are members of the International Fact Checking Network (IFCN) and comply with the IFCN code of principals of Nonpartisanship, Fairness and Transparency.

How successful are deep fakes in creating text which needs a deep knowledge of the covered topics? For example scientific news.

Or Levi: State-of-the-art text generators such as Open AI’s GPT-2 or Grover from the Allen Institute for AI – are already capable of generating coherent texts which would be hard to spot from human-generated text. These models do not necessarily acquire deep understanding of a specific domain, such as scientific news, but they are good at learning probabilistic sequences of words. Their lack of understanding of the knowledge is reflected in their inability to pass the Turing test. That is, by interacting with these models, asking questions and observing the answers, you will most likely be able to tell they are machine-generated. However, these models are very good at learning which next word is the most likely in a sequence, given a specific context, which serves them to generate coherent texts for a variety of domains.

 WeAreDevelopers Congress Vienna 2019. (Photo Credit © Tamás Künsztler)
Or Levi, Data Science Team Lead at eBay, shares further insights after his talk at WeAreDevelopers Congress Vienna 2019. (Photo Credit © Tamás Künsztler)

If the key differences between fake and real become known to the bad AI wouldn’t it just adapt?

Or Levi: A major challenge is the adversarial nature of the fake detection problem. It is a cat and mouse game where the malicious actors quickly adapt and evolve. For instance, last year researchers proposed to identify deepfakes based on unnatural blinking patterns. A few months later, the next generation of deepfakes has already included a fix that made this signal obsolete. 

This is a challenge we are constantly facing at AdVerif.ai with our research program – trying to find a balance between sharing our findings and open-sourcing code, while not disclosing information that could help bad actors to adapt.

Why is it better that the training set to distinguish fake news from satire was NOT sorted by source? Wouldn’t you want that for training data?

Or Levi: In our research paper “Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues” we addressed the challenge of classifying two genres that many people often confuse: fake news and satire. The efforts by social media platforms to reduce the exposure of users to misinformation have resulted, on several occasions, in flagging satire content, which is a protected form of speech. Consequently, social media sites changed their policy to exempt satire, but then purveyors of fake news began to masquerade as satire sites to avoid being demoted. This is why we took on the challenge of distinguishing between fake news and satire, not based on the source, but based on the content itself.

When observing the feature importance of our classification model, we saw that there are linguistic signals in the content that can tell fake news and satire apart. For example, among the significant features, we observed causal connectives, that are proven to be important in text comprehension, and two indices related to the text easability and readability, both suggesting that satire articles are more sophisticated, or less easy to read, than fake news articles.

WeAreDevelopers Congress Vienna 2019 (Photo Credit © Tamás Künsztler)
The main stage at WeAreDevelopers Congress Vienna 2019. (Photo Credit © Tamás Künsztler)

At WeAreDevelopers congresses, attendees have the opportunity to submit questions directly to the speaker. These questions were asked by the audience and answered by Or Levi after WeAreDevelopers Congress Vienna 2019.

Interested in learning more from other experts? Join us at one of our upcoming events!

Leave a Reply