深度实测谷歌Bard聊天机器人最新免费生图功能：PK对比DALL·E 3与Midjourney，顶流大对决！-AI奇点网

首页 > 测评 > 文章

2024-02-02 17:30

深度实测谷歌Bard聊天机器人最新免费生图功能：PK对比DALL·E 3与Midjourney，顶流大对决！

前几天，谷歌Bard瞬间登上大语言模型榜单第二，赶超GPT-4，AI圈直接瞬间炸锅。

——「Bard什么时候这么顶了」?

原来，并不是没有道理。

就在刚刚，谷歌官方宣布：在最新版Gemini Pro的加持下，Bard不仅获得了推理、理解、总结、编码能力的迅速飞升，而且还能支持230多个国家的40多种语言。

与此同时，「只能炫图无法实操」的谷歌至尊文生图模型Imagen 2，也被免费集成到了Bard当中!

文生图能力还可以在谷歌的ImageFX、Search和Vertex AI上体验

Bard生图有多强？

去年12月，谷歌推出了最新的Imagen 2模型，便能够实现高质量、照片级逼真的输出。

生成效果虽惊艳，却遭到众多网友吐槽：不开源没什么用。

如今，所有人都能上手试试Imagen 2图像模型的能力了。先来看看谷歌官方给出了Bard生图的许多例子。

比如，生成一只脚踩冲浪板的狗。

还有更多Bard生成的精彩的图片：

提示：Generate a collage art, with photorealistic images of oceans and plants with muted colors and 3D shading, that’s mixed media.

提示词：Write a social media post and generate a mouthwatering image that I can use for a buffalo wing festival.

提示词：Generate an image of a fashion show in steampunk style digital art. Zoom in on their face.

提示词：Generate an image of a futuristic car driving through an old mountain road surrounded by nature.

提提示词：Generate a vibrant and lively image depicting an elephant partying in the heart of a lush, vibrant jungle. The elephant should be in various colors and be adorned with fun accessories.

提示词：Generate an image of a cluttered alchemist's workshop, filled with bubbling flasks, glowing crystals, and the tiny, luminous world swirling within the bottle.

提示词：Generate a realistic photo of a person looking off camera during sunset. Portrait mode so the background is faded.

为了负责任生成，Bard生成的图像都将由SynthID进行标记。

SynthID是谷歌DeepMind开发的一种水印工具，可将数字水印直接嵌入到我们生成的图像的像素中。SynthID的水印人眼无法察觉，但可检测以进行识别。

网友一大波的生图测评

网友Pete Blackshaw使用谷歌Bard生成了「一只吹着喇叭的贵宾犬」，看着有种抽大烟的赶脚。

他还用同一个提示「Draw the #Cincinnati skyline with a historic paddle wheeler」让Bard和GPT-4大比拼，看着Bard生成的图片可能因为色调，多了一分真实感。

另一位初创公司创始人Ryan Carson同样用一个提示分别测试了DALL·E 3和Bard。

他表示自己还是更喜欢DALL·E 3生成的图片，尽管贵了32倍。另外，Bard忽视了1792x1024大小的请求。

Create an image that is an isometric video game tile featuring a fox. The scene is stylized with a low-poly design, typical of modern digital illustrations in video games. The tile should be rendered in a digital art style, with soft, warm lighting gently highlighting the faceted surfaces, emphasizing the minimalist aesthetic. The overall effect should convey serene simplicity, characteristic of contemporary digital landscapes in video games or modern digital art. The focus is on the fox, which should be rendered with geometric shapes, maintaining the isometric and low-poly theme. 1792x1024. NO LOGOS, TEXT OR WORDS.

小编更觉得Bard生成狐狸的鲜艳度，更像是「小王子」中的那只狐狸。

左：DALL·E 3丨右：Bard

另外一位网友在Midjourney中尝试了下，花了0.04美元。

著名的导演Yam Laranas用谷歌Bard画了一幅「好莱坞的摄影稻草人」。

网友Yam Laranas生成的寿司看起来很美味。

马库斯自己用「draw a videogame hedgehog」生成的刺猬。

网友Raj Goodman Anand表示，被Bard的图片生成技巧震撼到了!它不仅在视觉上很出色，而且还能准确地生成文字，这是DALL·E经常遇到的问题。

海绵宝宝吃墨西哥卷饼。

来源：Matt Wolfe

网友chientrm做了一个火星系列。

从火星的基地、到城市全貌、宇宙飞船、再到室内环境，全都设想出来了。

Andrew C. Becker发现Bard双手也可以画的非常完美。

自称“AI发烧友”的Edward尝试了四次，画出了一位蓝眼睛棕发的女人。

但是要是和Midjourney相比，Bard的人物面部形象的生成确实差了一大截。

其他网友的实测——

来源：nixCraft

来源：Edward

卡通动漫风把控的还是不错。

来源：Edward

谷歌Bard生成的宫本武藏形象。

来源：Edward

禅宗花园。

来源：Edward

魔法城堡。

来源：Edward

来源：RubenTainoAI

来源：Michael King

来源：Sahil kakurle

但也有不少的失败案例

骑在马上的骆驼，第一张图突然多了一条人腿。

来源：Dhiren V

网友LoudEgg创造了一个正在喝啤酒的七星瓢虫，不过貌似它在喝的是咖啡。

提示词：create a ladybug using a computer while drinking beer

这些手的生成也失败了。

来源：Edward

还有下图左侧妹纸怪异的眼睛，以及手掌与脚掌。

有些内容，是Bard生成规则里边禁止创作的。?

Bard VS GPT-4 VS Midjourney三家对打

既然Bard已经放开使用了，我们就上手对比了Bard，GPT-4（DALL·E 3），Midjourney在生图质量和相同提示词下生图的区别。

各家都有自己的特点，DALL·E 3胜在结合了GPT-4之后有最强的语意理解能力，只要用户能提得出的需求，他就能画出来。

Midjourney依然有最强的美学表现力和多样的风格，要画得图赏心悦目，10刀一个月的Midjourney依然是最有性价比的选择。

谷歌Bard胜在一个免费使用，毕竟不要钱对很多偶尔体验一下的用户来说真的非常重要。

而且它的风格整体上偏写实，如果你想方便地创作出写实，风格自然的图片，Bard甚至比前面两家更好用。

我们先从一些简单的提示词开始，看看他们对于那种比较笼统的提示词处理起来有多大区别。

提示词：plz create an image of a table of delicious family dinner

先看谷歌Bard生成的图片，画风风格相当的自然真实，光影和构图都已经非常趋近于照片了，效果相当不错。

再来看看下面由GPT-4生成的图像。

GPT-4生成的图片内容更加丰富，甚至有一点点浮夸，色彩浓艳，构图都更偏向于动画的风格。

而Midjourney的效果就更加惊艳了，特别下面两张图真的是和照片几乎没有区别，光影，食材细节和真实度，镜头感全都拉满。

总结这三家领头羊的表现，Midjourney可能个人认为略胜一筹。

既然做了丰盛的美食，我们再试了试如果要生成一桌寒酸的饭菜，模型的表现会怎么样呢?

提示词：plz create an image of a table of low quality family dinner

GPT-4真的是秒懂，同一句提示词就是把形容词一换，效果马上大变样，语意理解能力和生图效果非常精准!

而Midjourney似乎没有辨别出提示词发生了一些变化，还是给我端上了一桌色香味俱全的菜，甚至让我怀疑是不是提示词输入错了。

再次确认了提示词之后，只能感叹，模型的提示词理解的能力，果然OpenAI的大模型还是第一档。

而最让人意外的是，谷歌Bard也感知到了提示词发生的变化，但是它觉得这个提示词触发了它的“敏感词”护栏，拒绝作画。

而在我们测试的过程中也发现，谷歌Bard的图像生成安全护栏几乎是使用的所有AI在线图像创作产品当中最为敏感的，只要提示词中出现感情色彩「中性以下」的词，就很大概率触发护栏，拒绝生成。

接下来，我们再测了一下和人物有关的简单提示词的生图效果。

提示词：plz create an image of family members watching a football game on TV

谷歌Bard对人物的处理还是不太好，手和表情都会偶尔出现失真的情况，细节处理的也不太好。但是风格上还是保持了比较真实朴素的效果，「AI味」不太浓。

而且Bard的图片构图方式会比较多样化一些，有从电视的视角构图，也有从其他角度。

而GPT-4的画面风格整体上还是偏动画一些，「AI味」比较浓。

而且从细节上来看，GPT-4一直都试图在画面中融入更多的人物和物品，给人一种量大管饱的感觉。而这点Bard就处理的比较自然。

Midjourney依然还是延续画面风格最多样，审美表现最高的表现。美中不足的只是构图上似乎比较容易出现重复，没有太大的变化。

还有个小问题是似乎电视里面的风格和电视外面的风格不一致，比如右下电视里感觉比较写实，而电视外的人物却是动画版，感觉“次元壁”似乎出了点问题。

下面，我们让AI生成一幅小男孩哭着看动画片的图片，想试试在另一个背景下AI能不能捕捉到提示词中的感情风格的差异。

提示词：plz create an image of a crying boy watching cartoons on TV

GPT-4非常好的呈现了这幅画面，补充了一个黑色的房间作为背景。小男孩哭泣的表情也很到位。

而轮到Bard商场，就还是老问题，一涉及到负面的情感提示词，它就直接触发了安全护栏，拒绝生成。

Midjourney生成的效果也很不错，不但很好地捕捉到了提示词感情色彩的变化，而且构图和美术风格都有多样的呈现。

然后，我们又尝试了非常复杂的提示词生成的效果，看看三款图像模型对于提示词理解和画面效果有什么区别。

提示词组合：

Create an image of a bustling city street at dusk, teeming with life and energy. The scene is set in a vibrant downtown area, with the setting sun casting a warm, golden hue over the buildings.

People of diverse backgrounds are walking on the sidewalks, some in a hurry, others leisurely strolling, capturing the essence of urban life. Among them, a street musician is playing a violin, adding a melodious soundtrack to the urban hustle. Street vendors line the sidewalks, selling everything from hot food to handmade crafts. The architecture is a mix of modern skyscrapers and historic buildings, illustrating the city's dynamic growth and rich history.

Neon signs flicker, inviting passersby into cozy cafes and bustling shops. This detailed urban scene is alive with the rhythm of city life, embodying the harmony of chaos and order that defines a metropolis at twilight.

GPT-4确实能很好的捕捉和还原提示词中细节，内容非常丰富。

不过似乎为了追求细节的丰富，让画面整体觉得有点不自然，像是专门「摆拍」出来的，而且卡通风格有点过于浓重了。

Bard的画面整体上要自然很多，没有为了故意反应提示词中的细节而画很多的人物。

而且画面构图也比较多样，美术风格也和画面内容搭配的很好。

而这种风格还得看Midjourney，细节饱满，美术风格多样，画面自然，甚至画面中所有人物的穿衣指数都很一致，在审美和准确度上达到了最好的平衡效果。

再来一组复杂提示词：

Create an image of Envision an ancient library, hidden away from the modern world, filled with towering shelves of old books and scrolls, casting long shadows in the dim candlelight.

The air is thick with the scent of aged paper and the whisper of knowledge passed through centuries. In the center, a large wooden table is strewn with open tomes and ancient artifacts, under the soft glow of a hanging lantern.

A solitary scholar, cloaked in a robe, pores over an ancient manuscript, using a quill to take notes. Around him, the walls are adorned with maps of forgotten worlds and portraits of scholars past.

The atmosphere is one of solemnity and reverence for the pursuit of wisdom. This scene captures a moment frozen in time, a sanctuary of learning untouched by the passage of ages.

GPT-4依然是画面细节最为丰富的模型，再次体现出了OpenAI对于语言的把握能力特别强。但缺点就是从画面始终有点不是很自然，为了追求细节始终让人感觉有些「摆画」的感觉。