マンvマシン: 可能なコンピュータ・クック, 書くと私たちよりも優れたペイント?

Man v Machine: Can Computers Cook, Write and Paint Better Than Us?

今、ゲームに勝つことができる人工知能, あなたの顔を認識する, あなたの駐車券に対してアピー​​ル. しかし、それはあっても、人間がトリッキー見つけるものを行うことができます?


Guardian.co.ukによって供給というタイトルのこの記事 “男のV機: コンピュータが調理することができます, 書いて、私たちよりも優れたペイント?” レオベネディクトゥスによって書かれました, ガーディアン土曜日に6月4日のために 2016 08.00 UTC

一つの動画, 私のために, すべてを変えました. これは、古いAtariのゲームからの映像です 起こる, あなたはパドルを左右に画面の下部に沿ってスライド1, それらにボールをバウンスすることによってレンガを破壊しようとし. あなたは、ゲームのプレイヤーについて読んだことがあるかもしれ: によって開発されたアルゴリズム DeepMind, そのAlphaGoプログラム英国の人工知能同社はまた、最大のこれまで行く選手の一人を倒します, 李世ドル, 今年初め.

おそらく、あなたはコンピュータがコンピュータゲームが得意であることを期待します? 彼らは何をすべきか知ってたら, 彼らは確かに、より速く、より一貫して、任意のヒトよりもそれを行います. DeepMindのブレイクアウトプレーヤーは何も知りませんでした, しかしながら. これは、ゲームがどのように動作するかの指示でプログラムされていませんでした; それがあってもコントロールを使用する方法を聞いていないました. それが持っていたすべては、できるだけ多くのポイントを取得しようとする画面とコマンド上の画像でした.

時計 ビデオ. 最初は, パドルは忘却の彼方にボールドロップをすることができます, 良いがわかりません. 最終的には, ちょうど約いじくります, それが戻ってボールをノック, レンガを破壊し、ポイントを取得します。, それはこれを認識し、より頻繁にそれをしません. 2時間の練習後, または約 300 ゲーム, それは真剣に良いとなっています, あなたや私が今までになりますより良いです. その後, 後約 600 ゲーム, 物事は不気味な取得します. アルゴリズムは、同じ場所を目指して開始します, 何度も, 後ろのスペースにレンガを通って穴を掘るために. そこに一度, 任意のブレイクアウトプレーヤーが知っているように, ボールはしばらくの間、周りバウンスします, 無料でポイントを集めます. これは、コンピュータが独自に思い付いた良い戦略です.

「私たちの研究者がこれを見たとき, それは実際にそれらに衝撃を与えました,「DeepMindの最高経営責任者(CEO), デミス·ハサビス, パリでの技術会議で聴衆に語りました. あなたはできる 彼のデモを見ます, あまりに, そして、機械がその穴を掘るの戦略を割り出したときに笑いと拍手を聞きます. コンピュータは、インテリジェントになっています, 私たちのようなビット.

「人工知能」は、すべてのコンピューティングの話題フレーズの最も古く、最も宣伝についてだけです. アイデアは、まずによって真剣にmootedました Alan Turingコンピューティング機械・インテリジェンス, ザ· 1950 彼はとして知られるようになったものを提案している紙 チューリングテスト: マシンは、それが人間だったとの会話を通してあなたを納得させることができれば, それが本当に考えていたことを証明する可能性のある人間の限りをしていました. しかし、用語AIは、一般的になるまで使用されませんでした 1955, とき アメリカの数学者ジョン・マッカーシー 専門家のための会議を提案. これは、翌年開催されました, それ以来、フィールドはマニアと絶望の約二十年のサイクルで実行されました. (ファッションからその呪文を記述するために - 「AIの冬」 - 研究者があっても、新しい用語を持っています. 1970年代と1990年代は、特に過酷でした。)

今日新しいmaniaがあります, 他と違って見えるします: それはあなたのポケットに収まります. 電話はチェス世界チャンピオンを倒すことができます, あなたの子供たちのラジオで曲や画像認識, あなたの声が他の言語に翻訳します. とここに描か奈央ロボット Yotam Ottolenghi 2本足で歩くことができます, 話す, ボール、さらにはダンスを見つけます. (これは、ロボットの, しかし, ないAI: それは、メニューを設計することはできません。)

AIの進歩についてのヒアリング, あなたが励起され、あなたに伝えるために専門家を必要としません, またはおびえ. あなただけの感覚を得るために開始します: 知性はこちら. 明らかに、Googleは気持ちを持って, あまりに, それが噂の$ 650メートルのためDeepMindを買ったので、. で 2013, Facebookは独自のプロジェクトを立ち上げ, サイトの顔と自然言語認識を開発する計画で. 開発者は、既にインテリジェントchatbotsの作業を開始しました, これはFacebookのユーザーは、そのMessengerサービスを使用して召喚することができるようになります.

これまでのところ, コンピュータがすべてで「インテリジェント」されていません, またはだけ狭くので、. 彼らは私たちを魅了簡単なタスクで良いされてきました, このような数学など, しかし、私たちが当たり前のもので悪いです, 真剣に難しいことが判明しています. 歩行の行為は、現代のロボットは赤ちゃんのように学習し、まだと格闘ものです; 基本的なポタリングタスクは遠い夢のまま. 「一つの例は、あなたや私が誰か他の人の台所でお茶を作ることができるの容易さであります,"言う 教授アラン・ウィンフィールド, イングランドの西の大学のロボット研究家. 「ロボットはこれを行うことが地球上ありません。」

人間は非常に困難である理由を理解するために、, あなたが写真から人々を認識するようにコンピュータを得るかもしれない方法を考えます. AI Without, あなたが最初にそれを自分で行う方法を知っている必要があり, コンピュータをプログラムするために、. あなたが収集し、すべての可能なパターンについて考えなければなりません, 顔の色と形, そして、彼らは光の中で、異なる角度でどのように変わるか - あなたは重要であるかを知る必要があり、ちょうど泥がレンズにするものです. AIと, あなたが説明する必要はありません: あなただけのコンピュータに実際のデータの山を与え、それが学習させます. あなたが学習ソフトの設計方法難解な問題のまま, いくつかの人気のコンピュータ科学者の州, それは彼らが緩く、脳内の構造に基づくデータ処理の構造を工夫することにより勝者へ持って明らかです. (これは、「深い学習」と呼ばれています。) 実際のデータの山について, よく, それは何のGoogleです, フェイスブック, アマゾン, ユーバーとすべての残りの部分は転がっているために起こります.

この段階では, 我々はまだ最高を向けるだろうAIを使用するかわかりません. ジョシュNewlan, 上海で働くカリフォルニアコーダ, 無限の電話会議を聞いて飽きてしまいました, そう 彼は彼のために聞くためにいくつかのソフトウェアを内蔵しました. 今, Newlanの名前が言及されるたびに, 彼のコンピュータは即座に彼に最後の半分分の転写産物を送ります, 待機 15 秒, その後言って、彼の記録を果たしています, "ごめんなさい, 私は私のマイクがミュートにあった実現しなかった。「昨年, ジョシュブラウダー, 英国のティーンエイジャー, 内蔵 駐車券に対してアピー​​ルを無料人工弁護士; 彼は外国の法制度を通じて難民を導くために、別の建設を計画します. 可能性はまあ...あります, 多分このアルゴリズムは、可能性をカウントすることができます.

だから機械の心は1日、私たち自身を上回るだろう? 私が話すの研究者が慎重です, そして、自分のマシンが行うことができないものを強調するために痛みを取ります. しかし、私はテストにAIを置くことにしました: can it plan a meal as well as Ottolenghi? Can it paint my portrait? Is technology still artificially intelligent – or is it starting to be intelligent, for real?

The cooking test

よく, I will say it isn’t horrible. Humans have served me worse. Although in truth the name that IBM’s Chef Watson gives this dish (“Chicken Liver Savoury Sauce”) is about as appetising as it deserves.

To be fair to Chef Watson, and to Guardian Weekend’s own chef-columnist Yotam Ottolenghi, I had set them quite a task. I asked for a dish based on four ingredients that seemed to belong nowhere near each other: chicken livers, Greek yoghurt, wasabi and tequila. They could add whatever else they liked, but those four had to be in the finished dish, which I would cook and eat. Chef Watson didn’t hesitate, instantly giving me two pasta sauces. Ottolenghi was more circumspect. “When I got the challenge I thought, ‘This is not going to work,’” he tells me.

I thought the same. Or at least I thought I would end up eating two dishes that managed to be OK despite their ingredients, rather than because of them. In fact – and you’ll think me a creep, but so what – Ottolenghi’s recipe was a revelation: liver and onion and a tequila reduction, served with an apple, radish, beetroot and chicory slaw, with a wasabi and yoghurt dressing. The dish may make little sense on paper, but I devoured a plateful feeling that every element belonged. (And vinaigrette thickened with yoghurt and wasabi instead of mustard: seriously, give it a try.) Ottolenghi tells me the recipe is just a whisker short of publishable.

The thing is, that dish took him and his team three days to perfect. They were able to taste and discuss flavours, textures, 色, 温度, in a way that Watson can’t – although there have been “discussions” about adding a feedback mechanism in future, Chef Watson’s lead engineer, Florian Pinel, tells me. “A recipe is such a complex thing,” Ottolenghi says. “It’s difficult for me even to understand how a computer would approach it.”

Yotam Ottolenghi and Chef Watson’s dishes
Yotam Ottolenghi and Chef Watson’s dishes 写真: Jay Brooks for the Guardian

Watson was first built by IBM to win the television gameshow Jeopardy! で 2011. In some ways it was a misleading challenge, because for a computer the tough part of a quiz is understanding the questions, not knowing the answers; for humans, it’s the other way around. But Watson won, and its technology began to be applied elsewhere, including as a chef, generating new recipes based on 10,000 real examples taken from Bon Appétit magazine.

First the software had to “ingest” these recipes, as the Watson team put it. A lot of computation went into understanding what the ingredients were, how they were prepared, how long they were cooked for, in order to be able to explain how to use them in new dishes. (The process can still go awry. Even now Chef Watson recommends an ingredient called “Mollusk”, which it helpfully explains is “the sixth full-length album by Ween”.)

A bigger problem was trying to give the machine a sense of taste. “It’s easy enough for a computer to create a novel combination,” Pinel says, “but how can it evaluate one?” Watson was taught to consider each ingredient as a combination of specific flavour compounds – of which there are thousands – and then to combine ingredients that had compounds in common. (This principle, food pairing, is well established among humans.) 最後に, the software generates step-by-step instructions that make sense to a human cook. The emphasis is on surprises rather than practical meal planning. “Chef Watson is really there to inspire you,” Pinel explains. Each recipe comes with the reminder to “use your own creativity and judgment”.

And I need to. The first step is to “toast flat-leaf parsley”, which just isn’t a good idea. I am making, effectively, a slow-cooked spiced pork and beef ragu, including all my four ingredients, yet Watson oddly also includes cucumber and keeps telling me to “season with allspice”, which I refuse to do on principle. 最後に, I have a rich sauce with a flavour rather close to the farmyard, but not uneatable. I can’t taste the wasabi or the tequila, which I’m glad about.

Yotam Ottolenghi with Nao robot
Yotam Ottolenghi with Nao robot loaned courtesy of Heber primary school, ロンドン. 写真: Jay Brooks. Styling: Lee Flude

Watson is clever and the task is tough, but I am ready to say that this is no more than a bit of fun for food nerds, until Ottolenghi stops me. “I think the idea of slow-cooking the livers with a bit of meat is great,"と彼は言う. “It intensifies the flavour. Everything will come together. If I had to start afresh with this recipe, obviously the yoghurt doesn’t fit – but I would leave the orange skin there, a few of the spices. I don’t think it’s a very bad recipe. It could work.”

評決 Watson hides the weirdness of the ingredients, but Ottolenghi makes them sing.

The writing test

Put little Wordsmith next to the fearsome machines of IBM and Google, and it looks as computationally advanced as a pocket calculator. Yet while Watson fumbles through its apprenticeship, Wordsmith is already at work. If you read stock market reports from the Associated Press, or Yahoo’s sports journalism, there is a good chance you’ll think they were written by a person.

Wordsmith is an artificial writer. Developed by a company in North Carolina called Automated Insights, it plucks the most interesting nuggets from a dataset and uses them to structure an article (or email, or product listing). When it comes across really big news, it uses more emotive language. It varies diction and syntax to make its work more readable. Even a clumsy robot chef can have its uses, but writing for human readers must be smooth. Hooked up to a voice-recognition device such as Amazonのエコー, Wordsmith can even respond to a spoken human question – about the performance of one’s investments, say – with a thoughtfully spoken answer, announcing what’s interesting first, and leaving out what isn’t interesting at all. If you didn’t know the trick, you’d think Hal 9000 had arrived.

The trick is this: Wordsmith does the part of writing that people don’t realise is easy. Locky Stewart from Automated Insights gives me a tutorial. You write into Wordsmith a sentence such as, “New ABC figures show that the New York Inquirer’s circulation rose 3% in April.” Then you play around. ザ· 3% has come from your data, so you select the word “rose” and write a rule, known as a “branch”, which will change the word “rose” to the phrase “shot up” if the percentage is more than 5%. Then you branch “rose” to become “fell” if the percentage is negative. If the percentage is -5% or lower, “rose” becomes “plummeted”.

Then you feed it synonyms. So “plummeted” can also be “fell sharply by”. “The Inquirer’s circulation” can be “circulation at the Inquirer”. “Shot up” can be “soared” and so on. Then you add more sentences, perhaps about online traffic, or about which days’ print copies sold best, or about comparisons year-on-year. Then you get clever. You tell Wordsmith to put the sentences with the most newsworthy information first, defined perhaps as those that feature the greatest percentage changes. Maybe you add a branch to say that a result is “the best/worst performance among the quality titles”. Hell, you can even teach it some old Fleet Street tricks, so that if circulation plummets the piece begins “Editor Charles Kane is facing fierce criticism as”, but if circulation has “shot up” this becomes “Charles Kane has silenced critics with news that”. Insert “more” or “again” or “continues” if you get the same thing two months in a row.

“The artificial intelligence is actually the human intelligence that is building the network of logic,” Stewart says, “the same network you would use when writing a story. It could have been developed 10 または 15 数年前, in code, but to make it work at this scale has only been possible lately.” Clearly it takes longer to prepare an article on Wordsmith than to write one conventionally, but once you’ve done so, the computer can publish a fresh newspaper circulation story every month, on every newspaper, within seconds of receiving the information. It can publish millions of stories in minutes – or publish only some of them, if the data doesn’t reach a given threshold of newsworthiness. Thus it becomes an automated editor, あまりに, with adjustable tastes in thoroughness, frequency and hysteria.

For Wordsmith’s task, I suggest football: it’s a field that produces a lot of data and has a readership that wants personalised articles. Guardian football writer Jacob Steinberg volunteers to take on the computer, and I provide a table of facts from the recent Premier League: last season’s league position and this season’s position at Christmas and at the end, goals scored and conceded, top scorer’s name and total, value of summer transfers and a quote from the manager.

Working solely from this data, computer and human must each write a review of the season for a given club. Steinberg chooses Leicester City on the basis that its numbers should contain a story that anyone would see. Wordsmith doesn’t need to choose. It will do all 20.

And in fact both computer and human quickly produce quite similar work:

Leicester City footballer Jamie Vardy

Both Steinberg and Wordsmith deliver dramatic first sentences. Perhaps keen to sound authentic, Automated Insights use some clever tricks to put feeling into the latter’s article, astutely guessing that Leicester were “hoping to finish in the top 10 after a 14th place finish last season”. I look through Wordsmith’s other articles and Southampton, having finished seventh last season, have “eyes on a European spot”, while Manchester City “began the season dreaming of a league title after finishing second”.

Conversely, Steinberg digs more meaningfully into the numbers, showing that Jamie Vardy not only scored 24 ゴール, but that this was a higher percentage of his team’s goals than was managed by all but two other players. Knowing how Wordsmith works, もちろん, one could easily set it up to do the same. In fact looking through it, Steinberg’s entire article could have been created by a skilled Wordsmith programmer – with the exception of one line. “It’s a magical season,” he quotes the Leicester manager as saying, before adding, “justifiably so, given that a summer expenditure of £26.7m on transfers made them the eighth lowest spenders”. That “justifiably so” shows a writer who actually understands what he is writing.

評決 Steinberg is a much better writer, unless you want 20 data-heavy articles in 10 分.

The painting test

A laptop wants me to smile. “It’s in a good mood," Simon Colton 言う. He knows because he’s the scientist who programmed it. We are in the Science Museum in London, where the Painting Fool, as it is called, is giving a public demonstration. It’s important that I don’t show my teeth, Colton says, because something about the light makes them look green to the Painting Fool.

From my toothless smile the laptop creates a “conception” of what it would like to paint, based on its mood. The mood comes from a “sentiment analysis” of recent Guardian articles, それが起こると (on average reading the Guardian is a downer, 明らかに, apart from the stuff about gardening). Yesterday the Fool was in such a bad mood that it sent someone away unpainted; today it is feeling “positive”.

Next the Fool attempts to paint with a simulated brush and a simulated hand (実際に, an image of Colton’s hand) on the screen behind me. It learned to reflect its mood from the work of Dan Ventura, another computer scientist, at Brigham Young University in Utah, who trained a neural network to recognise the emotional attributes of images by sitting thousands of people in front of tens of thousands of paintings and asking them to tag each one with whatever adjectives came to mind. The Fool now knows that bright colours reflect a good mood, and “pencils with tight hatching” create a picture that is “cold”. When it is done, it prints out a page with a typed self-critique. “Overall, this is quite a bright portrait,” it says. “That’s OK, but my style has lowered the level of bright here. So I’m a bit annoyed about that.”

Here along with us, intrigued but too busy at her easel to watch, ある Sarah Jane Moon, an artist who exhibits with the Royal Society of Portrait Painters. She doesn’t want to see my teeth, いずれか. “We paint from life,"と彼女は言う, “and you can’t hold a smile for sitting upon sitting. That’s why all the traditional portraits show quite relaxed features.”

The Painting Fool is a special machine, and even slightly famous, but I can’t deny that Moon is almost all of why I’m excited to be here. The feeling of being painted by a real person, having them look at you and think about you, is exciting and flattering. Sentiment analysis and training data, 他方, don’t add up to anything whose view of me I care about, and the finished portraits do not change my mind. Moon’s is a lovely, real thing, which feels straight away like one person seen by another. The Fool’s three efforts have qualities I like, but mostly they look like photographs that have gone through some kind of software filter. Colton insists the Fool is here “to learn to be better” but I look and think: so what?

Painting of Leo Benedictus by Sarah Jane Moon
Leo Benedictus as seen by Sarah Jane Moon…
Painting of Leo Benedictus by the Painting Fool computer
…and as imagined by the Painting Fool laptop. 写真: Murray Ballard

Then I think some more. For one thing, it turns out that art is more mechanical than I’d realised. “I try to look at Leo as an abstract set of shapes, フォーム, 色, tones,” Moon tells Colton, “to get away from the fact that that’s a nose. Because when you start to do that, you get caught up in what you 考えます looks like a nose.”

“What the software does is break it down into colour regions,” Colton says.

"はい, 正確に,” Moon agrees. “I think that’s what the best painters do. It’s transcribing.” Afterwards she tells me she felt a kind of “kinship” with the software as they worked side by side.

More importantly, I realise that what matters isn’t how the machine paints; it’s how I see. Moon I understand, 私は思う. She’s a person and I know how that feels, so I care about her picture. But what does it feel like to be the Painting Fool? Is that what its portraits are trying to tell me?

評決 Moon’s painting is far richer; the Fool is still learning and has centuries of practice to go.

The translation test

Google Translate was the first piece of proper science fiction to come true, と it’s already a decade old. In many ways it typifies where AI has got to. Useful, 確か; impressive, without question; but still clunky as hell, despite big improvements.

If you haven’t used it, it works like this: enter text or web links in any of 103 supported languages and you get a rough translation seconds later in any of the others. The app on your phone will transcribe what you say and then speak it back, translated (32 languages supported); it can replace the text of a foreign language sign or menu wherever you point the camera. No explanation is needed of how cool that is (and it’s free).

Globally, half a billion people use Google Translate each month, mostly those who don’t speak English (これは 80% of people) but who want to understand the internet (これは 50% 英語). “Most of our growth, and actually most of our traffic, comes from developing or emerging markets such as Brazil, インドネシア, インド, タイ,” says Barak Turovsky, head of product management and user experience at Google Translate. It’s surprisingly popular for dating, あまりに, he adds. “Things like ‘I love you’ and ‘You have beautiful eyes’, that’s very prevalent.”

The software has always used a form of statistical machine learning: scouring the internet for already translated text – UN declarations, EU documents – and mapping the likelihood of certain words and phrases corresponding to one another. The more data it gathers, the better it gets, but the improvement levelled off a couple of years ago. すぐに, Turovsky says, they will deploy new deep learning algorithms, which will produce much more fluent translations.

たとえそうだと, there are limits, and some seem fundamental when you talk to a human translator and realise how subtle their work is. Ros SchwartzAnne de Freyman volunteer for this task. Both are professional French/English translators, and I need two because, in order to judge how good the translation is without being fluent in both languages, we need to translate twice – once out of English into French, once back again. Google Translate keeps no memory of the original and can do the same thing.

I choose a short passage of distinctive but not especially wild or ambiguous prose from the beginning of Herzog by Saul Bellow. Translators normally require context, so I tell Schwartz and De Freyman that it comes from a famous mid-century American novel.

Within a few days, Schwartz and De Freyman return a very smooth facsimile of the original text. Here and there some nuances have not survived, but the passage remains a pleasure to read, and the main meanings come across exactly.

Google Translate takes only a few seconds, and the result is both impressive and inadequate, weirdly good in places, in others weirdly bad – turning “he” into “it” and concocting the idea that Herzog is in love. Miraculously, it keeps “cracked” as a description of the hero. French has no word that combines the sense of “broken” and “mad” that cracked coveys in English, so De Freyman makes it “cinglé”, which comes back from Schwartz as “crazy”.

“Google Translate would look at statistical probability and say, what does ‘cracked’ mean?” Turovsky explains. “And statistically, it will try to decide whether it means ‘cracked’ or ‘crazy’ or whatever. それ, for a machine, is a non-trivial task.” Nor is it simple for a human, even though we find it easy. You’d have to ask whether Bellow could have meant that Herzog was “cracked” as in physically fractured. Then you’d have to assume not, because human bodies don’t generally do that. So you’d wonder what he did mean and assume instead, if you were not already familiar with the usage, that he must mean “crazy”, because you understand the rest of what you’ve read. But to do all this, wouldn’t Google Translate have to be pretty much conscious, I ask? Turovsky laughs. “I don’t think I’m qualified to answer that question.”

評決 Some bullseyes and howlers from Google Translate, while Schwartz and De Freyman are fluent and exact.

guardian.co.uk©ガーディアンニュース & メディア·リミテッド 2010

関連記事