00:13Let me show you something.00:1

00:13
Let me show you something.
00:17
(Video) Girl: Okay, that's a cat sitting in a bed. The boy is petting the elephant. Those are people that are going on an airplane. That's a big airplane.
00:32
Fei-Fei Li: This is a three-year-old child describing what she sees in a series of photos. She might still have a lot to learn about this world, but she's already an expert at one very important task: to make sense of what she sees. Our society is more technologically advanced than ever. We send people to the moon, we make phones that talk to us or customize radio stations that can play only music we like. Yet, our most advanced machines and computers still struggle at this task. So I'm here today to give you a progress report on the latest advances in our research in computer vision, one of the most frontier and potentially revolutionary technologies in computer science.
01:23
Yes, we have prototyped cars that can drive by themselves, but without smart vision, they cannot really tell the difference between a crumpled paper bag on the road, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Drones can fly over massive land, but don't have enough vision technology to help us to track the changes of the rainforests. Security cameras are everywhere, but they do not alert us when a child is drowning in a swimming pool. Photos and videos are becoming an integral part of global life. They're being generated at a pace that's far beyond what any human, or teams of humans, could hope to view, and you and I are contributing to that at this TED. Yet our most advanced software is still struggling at understanding and managing this enormous content. So in other words, collectively as a society, we're very much blind, because our smartest machines are still blind.
02:42
"Why is this so hard?" you may ask. Cameras can take pictures like this one by converting lights into a two-dimensional array of numbers known as pixels, but these are just lifeless numbers. They do not carry meaning in themselves. Just like to hear is not the same as to listen, to take pictures is not the same as to see, and by seeing, we really mean understanding. In fact, it took Mother Nature 540 million years of hard work to do this task, and much of that effort went into developing the visual processing apparatus of our brains, not the eyes themselves. So vision begins with the eyes, but it truly takes place in the brain.
03:37
So for 15 years now, starting from my Ph.D. at Caltech and then leading Stanford's Vision Lab, I've been working with my mentors, collaborators and students to teach computers to see. Our research field is called computer vision and machine learning. It's part of the general field of artificial intelligence. So ultimately, we want to teach the machines to see just like we do: naming objects, identifying people, inferring 3D geometry of things, understanding relations, emotions, actions and intentions. You and I weave together entire stories of people, places and things the moment we lay our gaze on them.
04:27
The first step towards this goal is to teach a computer to see objects, the building block of the visual world. In its simplest terms, imagine this teaching process as showing the computers some training images of a particular object, let's say cats, and designing a model that learns from these training images. How hard can this be? After all, a cat is just a collection of shapes and colors, and this is what we did in the early days of object modeling. We'd tell the computer algorithm in a mathematical language that a cat has a round face, a chubby body, two pointy ears, and a long tail, and that looked all fine. But what about this cat? (Laughter) It's all curled up. Now you have to add another shape and viewpoint to the object model. But what if cats are hidden? What about these silly cats? Now you get my point. Even something as simple as a household pet can present an infinite number of variations to the object model, and that's just one object.
05:43
So about eight years ago, a very simple and profound observation changed my thinking. No one tells a child how to see, especially in the early years. They learn this through real-world experiences and examples. If you consider a child's eyes as a pair of biological cameras, they take one picture about every 200 milliseconds, the average time an eye movement is made. So by age three, a child would have seen hundreds of millions of pictures of the real world. That's a lot of training examples. So instead of focusing solely on better and better algorithms, my insight was to give the algorithms the kind of training data that a child was given through experiences in both quantity and quality.
06:43
Once we know this, we knew we needed to collect a data set that has far more images than we have ever had before, perhaps thousands of times more, and together with Professor Kai

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

00:13Let me show you something.00:17(Video) Girl: Okay, that's a cat sitting in a bed. The boy is petting the elephant. Those are people that are going on an airplane. That's a big airplane.00:32Fei-Fei Li: This is a three-year-old child describing what she sees in a series of photos. She might still have a lot to learn about this world, but she's already an expert at one very important task: to make sense of what she sees. Our society is more technologically advanced than ever. We send people to the moon, we make phones that talk to us or customize radio stations that can play only music we like. Yet, our most advanced machines and computers still struggle at this task. So I'm here today to give you a progress report on the latest advances in our research in computer vision, one of the most frontier and potentially revolutionary technologies in computer science.01:23Yes, we have prototyped cars that can drive by themselves, but without smart vision, they cannot really tell the difference between a crumpled paper bag on the road, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Drones can fly over massive land, but don't have enough vision technology to help us to track the changes of the rainforests. Security cameras are everywhere, but they do not alert us when a child is drowning in a swimming pool. Photos and videos are becoming an integral part of global life. They're being generated at a pace that's far beyond what any human, or teams of humans, could hope to view, and you and I are contributing to that at this TED. Yet our most advanced software is still struggling at understanding and managing this enormous content. So in other words, collectively as a society, we're very much blind, because our smartest machines are still blind.02:42"Why is this so hard?" you may ask. Cameras can take pictures like this one by converting lights into a two-dimensional array of numbers known as pixels, but these are just lifeless numbers. They do not carry meaning in themselves. Just like to hear is not the same as to listen, to take pictures is not the same as to see, and by seeing, we really mean understanding. In fact, it took Mother Nature 540 million years of hard work to do this task, and much of that effort went into developing the visual processing apparatus of our brains, not the eyes themselves. So vision begins with the eyes, but it truly takes place in the brain.03:37
So for 15 years now, starting from my Ph.D. at Caltech and then leading Stanford's Vision Lab, I've been working with my mentors, collaborators and students to teach computers to see. Our research field is called computer vision and machine learning. It's part of the general field of artificial intelligence. So ultimately, we want to teach the machines to see just like we do: naming objects, identifying people, inferring 3D geometry of things, understanding relations, emotions, actions and intentions. You and I weave together entire stories of people, places and things the moment we lay our gaze on them.
04:27
The first step towards this goal is to teach a computer to see objects, the building block of the visual world. In its simplest terms, imagine this teaching process as showing the computers some training images of a particular object, let's say cats, and designing a model that learns from these training images. How hard can this be? After all, a cat is just a collection of shapes and colors, and this is what we did in the early days of object modeling. We'd tell the computer algorithm in a mathematical language that a cat has a round face, a chubby body, two pointy ears, and a long tail, and that looked all fine. But what about this cat? (Laughter) It's all curled up. Now you have to add another shape and viewpoint to the object model. But what if cats are hidden? What about these silly cats? Now you get my point. Even something as simple as a household pet can present an infinite number of variations to the object model, and that's just one object.
05:43
So about eight years ago, a very simple and profound observation changed my thinking. No one tells a child how to see, especially in the early years. They learn this through real-world experiences and examples. If you consider a child's eyes as a pair of biological cameras, they take one picture about every 200 milliseconds, the average time an eye movement is made. So by age three, a child would have seen hundreds of millions of pictures of the real world. That's a lot of training examples. So instead of focusing solely on better and better algorithms, my insight was to give the algorithms the kind of training data that a child was given through experiences in both quantity and quality.
06:43
Once we know this, we knew we needed to collect a data set that has far more images than we have ever had before, perhaps thousands of times more, and together with Professor Kai

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

00:13
ผมขอแสดงบางสิ่งบางอย่าง.
00:17
(วิดีโอ) หญิงสาว: โอเคที่แมวนั่งอยู่ในเตียง เด็กที่ถูกลูบคลำช้าง เหล่านี้คือคนที่จะไปบนเครื่องบิน นั่นเป็นเครื่องบินขนาดใหญ่.
00:32
Fei Fei-Li: นี้เป็นเด็กสามปีอธิบายสิ่งที่เธอเห็นในซีรีส์ของภาพถ่าย เธอยังคงมีจำนวนมากที่จะเรียนรู้เกี่ยวกับโลกนี้ แต่เธอมีอยู่แล้วผู้เชี่ยวชาญที่งานหนึ่งที่สำคัญมากที่จะทำให้ความรู้สึกของสิ่งที่เธอเห็น สังคมของเรามีมากขึ้นเทคโนโลยีขั้นสูงกว่าที่เคย เราส่งคนไปดวงจันทร์ที่เราทำโทรศัพท์มือถือที่พูดคุยกับเราหรือปรับแต่งสถานีวิทยุที่สามารถเล่นเพลงเดียวที่เราชอบ แต่เครื่องที่ทันสมัยที่สุดและคอมพิวเตอร์ของเรายังคงต่อสู้ที่งานนี้ ดังนั้นฉันมาที่นี่ในวันนี้เพื่อให้รายงานความคืบหน้าความก้าวหน้าล่าสุดในการวิจัยของเราในคอมพิวเตอร์วิสัยทัศน์ซึ่งเป็นหนึ่งในชายแดนมากที่สุดและเทคโนโลยีการปฏิวัติอาจเกิดขึ้นในวิทยาการคอมพิวเตอร์.
01:23
ใช่เราได้เป็นต้นแบบรถยนต์ที่สามารถขับรถด้วยตัวเอง แต่ไม่มีวิสัยทัศน์ของสมาร์ทที่พวกเขาไม่สามารถจริงๆบอกความแตกต่างระหว่างถุงกระดาษยู่ยี่บนท้องถนนซึ่งสามารถทำงานได้มากกว่าและร็อคที่มีขนาดที่ควรหลีกเลี่ยง เราได้ทำให้กล้องล้านพิกเซลนิยาย แต่เรายังไม่ได้ส่งสายตาคนตาบอด ลูกกระจ๊อกสามารถบินเหนือแผ่นดินใหญ่ แต่ไม่ได้มีเทคโนโลยีการมองเห็นมากพอที่จะช่วยให้เราเพื่อติดตามการเปลี่ยนแปลงของป่าฝนที่ กล้องรักษาความปลอดภัยที่มีทุกที่ แต่พวกเขาไม่ได้แจ้งเตือนเราเมื่อเด็กจมน้ำในสระว่ายน้ำ ภาพถ่ายและวิดีโอจะกลายเป็นส่วนหนึ่งของชีวิตทั่วโลก พวกเขากำลังถูกสร้างขึ้นที่ก้าวที่ไกลเกินกว่าสิ่งใด ๆ ของมนุษย์หรือทีมงานของมนุษย์ได้หวังว่าจะดูและคุณและฉันมีส่วนร่วมในที่ที่ TED นี้ แต่ซอฟต์แวร์ที่ทันสมัยที่สุดของเรายังคงดิ้นรนที่เข้าใจและจัดการเนื้อหามหาศาลนี้ ดังนั้นในคำอื่น ๆ ที่รวมกันเป็นสังคมเราเป็นอย่างมากคนตาบอดเพราะเครื่องที่ฉลาดของเรายังคงเป็นคนตาบอด.
02:42
"ทำไมเป็นแบบนี้อย่างหนักเพื่อ?" คุณอาจถาม กล้องสามารถถ่ายภาพเช่นนี้โดยการแปลงไฟเป็นอาร์เรย์สองมิติของตัวเลขที่รู้จักกันเป็นพิกเซล แต่เหล่านี้เป็นเพียงตัวเลขไม่มีชีวิตชีวา พวกเขาไม่ได้พกความหมายในตัวเอง เพียงแค่ต้องการจะได้ยินไม่ได้เป็นเช่นเดียวกับที่จะฟังการถ่ายภาพไม่ได้เช่นเดียวกับที่จะเห็นและโดยเห็นเราจริงๆหมายถึงความเข้าใจ ในความเป็นจริงก็เอาธรรมชาติ 540,000,000 ปีของการทำงานที่จะทำงานนี้และมากของความพยายามที่ไปในการพัฒนาการประมวลผลอุปกรณ์ภาพของสมองของเราไม่ได้ตาตัวเอง วิสัยทัศน์ดังนั้นเริ่มต้นด้วยตา แต่อย่างแท้จริงจะเกิดขึ้นในสมอง.
03:37
ดังนั้นสำหรับ 15 ปีตอนนี้เริ่มต้นจากปริญญาเอกของฉัน ที่คาลเทคแล้วชั้นนำวิสัยทัศน์ของสแตนฟอแล็บ, ฉันได้ทำงานกับพี่เลี้ยงของฉันทำงานร่วมกันและนักเรียนในการสอนคอมพิวเตอร์ที่จะเห็น สาขาการวิจัยของเราที่เรียกว่าคอมพิวเตอร์วิสัยทัศน์และการเรียนรู้เครื่อง มันเป็นส่วนหนึ่งของสนามทั่วไปของปัญญาประดิษฐ์ ดังนั้นในท้ายที่สุดเราต้องการที่จะสอนให้เครื่องที่จะเห็นเช่นเดียวกับที่เราทำ: การตั้งชื่อวัตถุระบุคนอนุมานเรขาคณิต 3 มิติของสิ่งที่เข้าใจความสัมพันธ์ระหว่างอารมณ์การกระทำและความตั้งใจ คุณและฉันร่วมกันสานเรื่องราวทั้งหมดของคนสถานที่และสิ่งขณะที่เราวางสายตาของเรากับพวกเขา.
04:27
ก้าวแรกสู่เป้าหมายนี้คือการสอนคอมพิวเตอร์ที่จะมองเห็นวัตถุที่สร้างบล็อกของโลกที่มองเห็น ในแง่ที่ง่ายที่สุดคิดกระบวนการเรียนการสอนนี้เป็นที่แสดงภาพการฝึกอบรมคอมพิวเตอร์บางส่วนของวัตถุโดยเฉพาะอย่างยิ่งสมมติว่าแมวและการออกแบบรูปแบบที่เรียนรู้จากการฝึกอบรมภาพเหล่านี้ วิธีการที่ยากนี้สามารถ? หลังจากที่ทุกแมวเป็นเพียงคอลเลกชันของรูปทรงและสีและนี่คือสิ่งที่เราทำในวันแรกของการสร้างแบบจำลองวัตถุ เราจะบอกขั้นตอนวิธีคอมพิวเตอร์ในภาษาทางคณิตศาสตร์ที่แมวมีใบหน้ากลมร่างกายอ้วนสองหูแหลมและหางยาวและที่ดูดีทั้งหมด แต่สิ่งที่เกี่ยวกับแมวนี้หรือไม่? (หัวเราะ) มันเป็นขด ตอนนี้คุณต้องเพิ่มรูปร่างและมุมมองอีกรูปแบบวัตถุ แต่ถ้าแมวที่ซ่อนอยู่? สิ่งที่เกี่ยวกับแมวโง่เหล่านี้หรือไม่ ตอนนี้คุณจะได้รับจุดของฉัน แม้สิ่งที่ง่ายเป็นสัตว์เลี้ยงในครัวเรือนสามารถนำเสนอเป็นจำนวนอนันต์ของการเปลี่ยนแปลงรูปแบบวัตถุและนั่นเป็นเพียงวัตถุหนึ่ง.
05:43
ดังนั้นประมาณแปดปีที่ผ่านมาสังเกตได้ง่ายมากและลึกซึ้งเปลี่ยนความคิดของฉัน ไม่มีใครบอกเด็กวิธีการดูโดยเฉพาะอย่างยิ่งในช่วงต้นปี พวกเขาเรียนรู้ผ่านประสบการณ์จริงของโลกและตัวอย่าง หากคุณพิจารณาสายตาของเด็กที่เป็นคู่ของกล้องชีวภาพพวกเขาใช้เวลาหนึ่งภาพเกี่ยวกับทุก 200 มิลลิวินาทีเวลาเฉลี่ยการเคลื่อนไหวตาทำ ดังนั้นโดยอายุสามเด็กจะได้เห็นหลายร้อยล้านภาพของโลกแห่งความจริง นั่นเป็นจำนวนมากตัวอย่างการฝึกอบรม ดังนั้นแทนที่จะมุ่งเน้น แต่เพียงผู้เดียวในขั้นตอนวิธีการที่ดีกว่าและดีกว่าความเข้าใจของฉันคือการให้ขั้นตอนวิธีชนิดของข้อมูลการฝึกอบรมที่เด็กได้รับผ่านประสบการณ์ทั้งในปริมาณและคุณภาพ.
06:43
เมื่อเรารู้เรื่องนี้เรารู้ว่าเราจำเป็นต้องใช้ในการเก็บรวบรวม ข้อมูลชุดที่มีภาพที่ไกลมากขึ้นกว่าที่เราเคยมีมาก่อนอาจจะเป็นพันครั้งมากขึ้นและร่วมกับศาสตราจารย์ไก่

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.