等不及马丁新作,人工智能续写《冰与火之歌》!想看戳这里
为了给苦苦等待的粉丝们找点乐趣,软件工程师扎克·图特(Zack Thoutt)让循环神经网络人工智能技术学习该剧原著《冰与火之歌》前五部的内容,然后续写五章剧情。这些人工智能创作的情节与粉丝早前的一些推测部分吻合,比如,詹姆最终杀死了瑟曦,囧雪成了龙骑士,而瓦里斯毒死了龙母。如果你感兴趣,可以在GitHub的主页上查看所有章节。
下面来了解一下人工智能是如何做到的:
After feeding a type of AI known as a recurrent neural network the roughly 5,000 pages of Martin's five previous books, software engineer Zack Thoutt has used the algorithm to predict what will happen next.
软件工程师扎克·图特让一种名为循环神经网络的人工智能技术学习了《冰与火之歌》前五部近5000页的内容,然后利用该算法预测接下来的情节。
According to the AI's predictions, some long-held fan theories do play out - in the five chapters generated by the algorithm so far, Jaime ends up killing Cersei, Jon rides a dragon, and Varys poisons Daenerys.
根据人工智能的预测,一些粉丝早前的推测的确出现了。在该算法目前撰写的五章内容中,詹姆最终杀死了瑟曦,囧雪成了龙骑士,而瓦里斯毒死了龙母。
如果你感兴趣,可以在GitHub的主页上查看所有章节。附上传送门:
https://github.com/zackthoutt/got-book-6/tree/master/generated-book-v1
Each chapter starts with a character's name, just like Martin's actual books.
和马丁本人撰写的小说一样,每章打头的文字都是一个角色的名字。
But in addition to backing up what many of us already suspect will happen, the AI also introduces some fairly unexpected plot turns that we're pretty sure aren't going to be mirrored in either the TV show or Martin's books, so we wouldn't get too excited just yet.
不过,我们也不要太过兴奋,因为除了存在很多人已经预测会发生的剧情外,这个人工智能算法还引入了一些令人意外的情节,它们绝对不会出现在电视剧或马丁的小说中。
For example, in the algorithm's first chapter, written from Tyrion's perspective, Sansa turns out to be a Baratheon.
例如,算法编写的第一章从小恶魔的视角写道,珊莎其实属于拜拉席恩家族。
There's also the introduction of a strange, pirate-like new character called Greenbeard.
书中还出现了一个名叫Greenbeard的怪咖,这个新角色的身份和海盗类似。
"It's obviously not perfect," Thoutt told Sam Hill over at Motherboard. "It isn't building a long-term story and the grammar isn't perfect. But the network is able to learn the basics of the English language and structure of George R.R. Martin's style on its own."
图特在接受Motherboard采访时告诉山姆•希尔,“这个算法显然并不完美,它不能编写长篇故事,语法也有问题。但是神经网络可以自学英语的基本语言知识以及马丁的文风结构。”
Neural networks are a type of machine learning algorithm that are inspired by the human brain's ability to not just memorize and follow instructions, but actually learn from past experiences.
神经网络是一种机器学习算法,设计灵感来自于人脑的记忆能力、遵循指令的能力以及从过去经验学习的能力。
A recurrent neural network is a specific subclass, which works best when it comes to processing long sequences of data, such as lengthy text from five previous books.
一个循环神经网络是一个特定的子集,最擅长处理长的数据序列,比如《冰与火之歌》前5部冗长的文本。
In theory, Thoutt's algorithm should be able to create a true sequel to Martin's existing work, based off things that have already happened in the novels.
理论上,图特的算法应该能基于书中已经出现的剧情创作出《冰与火之歌》真正的续集。
But in practice, the writing is clumsy and, most of the time, nonsensical. And it also references characters that have already died.
但实际上,这个算法的写作能力还很低级,大部分内容都不知所云,还会提到已经死掉的角色。
Still, some of the lines sound fairly prophetic:
不过,有些台词还是有一定预言性的:
"Arya saw Jon holding spears. Your grace," he said to an urgent maid, afraid. "The crow's eye would join you.
他对一个焦急的女仆说,“陛下,艾莉亚看到雪诺拿着长矛。乌鸦的眼睛会跟着你。”
"A perfect model would take everything that has happened in the books into account and not write about characters being alive when they died two books ago," Thoutt told Motherboard.
图特告诉Motherboard:“完美的算法模型能把书中的所有剧情考虑在内,且不会再让两部以前去世的角色再次复活。”
"The reality, though, is that the model isn't good enough to do that. If the model were that good authors might be in trouble ... but it makes a lot of mistakes because the technology to train a perfect text generator that can remember complex plots over millions of words doesn't exist yet."
“然而,实际上这个算法现在还不够完善。如果它有那么完美的话,作家们可能就要丢饭碗了……完美的文字创作机器可以记住数百万字的复杂剧情,现在的技术还不能训练出这种功能,它会犯很多错误。”
One of the main limitations here is the fact that the books just don't contain enough data for an algorithm.
最主要的局限之一是书中包含的数据对一个算法而言是不够的。
Although anyone who's read them will testify that they're pretty damn long, they actually represent quite a small data set for a neural network to learn from.
虽然《冰与火之歌》的读者都认为这部小说太长了,但是对于神经网络要学习的数据集来说,这些内容太少了。
But at the same time they contain a whole lot of unique words, nouns, and adjectives which aren't reused, which makes it very hard for the neural network to learn patterns.|
此外,书中包含了许多独特的词汇、名词和形容词,它们没有重复出现,这使得神经网络很难学习到模式。
Thoutt told Hill that a better source would be a book 100 times longer, but with the level of vocabulary of a children's book.
图特告诉希尔,更合适的数据源是一本比《冰与火之歌》长100倍,且词汇水平相当于儿童图书的书籍。