When a dataset outgrows the storage

When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines. Filesystems that manage the storage across a network of machines are called distributed filesystems. HDFS is designed for storing very large files with write-once-ready-many-times patterns, running on clusters of commodity hardware. HDFS is not a good fit for low-latency data access, when there are lots of small files and for modifications at arbitrary offsets in the file.

Files in HDFS are broken into block-sized chunks, default size being 64MB, which are stored as independent units.

An HDFS cluster has two types of node operating in a master-worker pattern: a NameNode (the master) and a number of DataNodes (workers). The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. The namenode also knows the datanodes on which all the blocks for a given file are located. Datanodes are the workhorses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.

MapReduce

MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster. The framework is inspired by the map and reduce functions commonly used in functional programming.

In the “Map” step, the master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. The worker node processes the smaller problem, and passes the answer back to its master node. In the “Reduce” step, the master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

There are two types of nodes that control this job execution process: a JobTracker and a number of TaskTrackers. The jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers. Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job. If a task fails, the jobtracker can reschedule it on a different tasktracker

Files in HDFS are broken into block-sized chunks, default size being 64MB, which are stored as independent units.

An HDFS cluster has two types of node operating in a master-worker pattern: a NameNode (the master) and a number of DataNodes (workers). The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree. The namenode also knows the datanodes on which all the blocks for a given file are located. Datanodes are the workhorses of the filesystem. They store and retrieve blocks when they are told to (by clients or the namenode), and they report back to the namenode periodically with lists of blocks that they are storing.

MapReduce

MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster. The framework is inspired by the map and reduce functions commonly used in functional programming.

In the “Map” step, the master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. The worker node processes the smaller problem, and passes the answer back to its master node. In the “Reduce” step, the master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.

There are two types of nodes that control this job execution process: a JobTracker and a number of TaskTrackers. The jobtracker coordinates all the jobs run on the system by scheduling tasks to run on tasktrackers. Tasktrackers run tasks and send progress reports to the jobtracker, which keeps a record of the overall progress of each job. If a task fails, the jobtracker can reschedule it on a different tasktracker

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

เมื่อการชุดข้อมูล outgrows ความจุของเครื่องจริงเดียว จะต้องแบ่งพาร์ติชันในจำนวนเครื่องแยกต่างหาก Filesystems ที่จัดการการจัดเก็บข้อมูลผ่านเครือข่ายของเครื่องจักรเรียกว่า filesystems กระจาย HDFS ถูกออกแบบมาสำหรับจัดเก็บไฟล์ขนาดใหญ่มาก ด้วยรูปแบบเขียนครั้งเดียวพร้อมความเวลา ทำงานบนคลัสเตอร์ของสินค้าฮาร์ดแวร์ HDFS ไม่พอดี สำหรับการเข้า ถึงข้อมูลเวลาแฝงต่ำ เมื่อมีแฟ้มขนาดเล็กจำนวนมาก และ สำหรับการแก้ไขที่ปรับค่ากำหนดในไฟล์แฟ้ม HDFS จะแบ่งขนาดบล็อกก้อน ขนาดเริ่มต้น 64 เมกะไบต์ ที่เก็บเป็นหน่วยอิสระคลัสเตอร์ HDFS มีสองชนิดของโหนที่ปฏิบัติการในรูปแบบหลักของผู้ปฏิบัติงาน: NameNode (หลัก) และหมายเลข DataNodes (คนงาน) Namenode การจัดการ namespace ระบบแฟ้ม มันรักษาต้นระบบแฟ้มและข้อมูลเมตาสำหรับไฟล์และไดเรกทอรีในแผนภูมิทั้งหมด Namenode รู้ว่า datanodes ซึ่งทั้งหมดบล็อกสำหรับไฟล์ที่กำหนดอยู่ Datanodes workhorses ของระบบแฟ้มอยู่ พวกเขาเก็บ และดึงข้อมูลบล็อกเมื่อพวกเขาจะบอกให้ (โดยลูกค้าหรือการ namenode), และพวกเขารายงานกลับไป namenode กับรายการของบล็อกที่พวกเขาจะเก็บเป็นระยะ ๆMapReduceMapReduce เป็นกรอบสำหรับการประมวลผลสูงเสรีปัญหาข้าม datasets ขนาดใหญ่ที่ใช้เป็นจำนวนมากคอมพิวเตอร์ (โหนด), โดยรวมเรียกว่าคลัสเตอร์ กรอบทางแผนที่ และลดฟังก์ชั่นที่ใช้ในการเขียนโปรแกรมทำงานในขั้นตอนการ "แผนที่" โหนหลักใช้กั้นเข้า ค่าเป็นปัญหาย่อยที่เล็กลง และกระจายให้กับผู้ปฏิบัติงานโหน โหนดของผู้ปฏิบัติงานประมวลผลปัญหาเล็ก และส่งคำตอบไปโหนหลักของ ในขั้นตอนการ "ลด" โหนหลักแล้วรวบรวมคำตอบของปัญหาย่อยทั้งหมด และรวมพวกเขาในบางแบบฟอร์มผลลัพธ์ – คำตอบเรื่องเดิมพยายามที่จะแก้ปัญหามีอยู่สองชนิดของโหนที่ควบคุมกระบวนการการดำเนินการงานนี้: หมายเลข TaskTrackers และ JobTracker เป็น Jobtracker การประสานทั้งงานที่รันบนระบบ โดยการจัดกำหนดการงานเพื่อรันบน tasktrackers Tasktrackers ทำงาน และส่งรายงานความก้าวหน้า jobtracker ซึ่งช่วยให้ข้อมูลความคืบหน้าโดยรวมของแต่ละงาน ถ้างานล้มเหลว jobtracker สามารถจัดกำหนดการใหม่ได้บน tasktracker ที่แตกต่างกัน

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

เมื่อวันที่ outgrows ความจุทางกายภาพเดียวเครื่องก็จะต้องกั้นผ่านหมายเลขของเครื่องแยก ระบบแฟ้มจัดการกระเป๋าข้ามเครือข่ายของเครื่องจะเรียกว่าการกระจายระบบแฟ้ม . hdfs ถูกออกแบบมาสำหรับการจัดเก็บไฟล์ขนาดใหญ่มากกับเขียนเมื่อพร้อมแล้วหลายครั้ง ลวดลาย วิ่งบนกลุ่มของอุปกรณ์ชุดhdfs ไม่เหมาะกับการเข้าถึงข้อมูลศักยภาพต่ำ เมื่อมีไฟล์ขนาดเล็กและการปรับเปลี่ยนที่โดยพลการชดเชยในแฟ้ม ไฟล์

ใน hdfs จะแตกเป็นบล็อกขนาดอย่างชัดเจน ขนาดเริ่มต้นที่ 64 ซึ่งจะถูกเก็บไว้เป็นหน่วยอิสระ

เป็น hdfs ที่มีสองประเภทของโหนดคลัสเตอร์ปฏิบัติการ ในรูปแบบ : namenode คนงานหลัก ( Master ) และหมายเลขของ datanodes ( คนงาน )การ namenode จัดการระบบไฟล์ namespace . มันรักษาต้นไม้ระบบแฟ้มและข้อมูลสำหรับทุกไฟล์และไดเรกทอรีในต้นไม้ การ namenode ยังรู้ datanodes ที่บล็อกทั้งหมดสำหรับไฟล์ให้อยู่ datanodes เป็น workhorses ของระบบแฟ้ม . พวกเขาเก็บและเรียกบล็อกเมื่อพวกเขาบอกมา ( โดยลูกค้าหรือ namenode )และเขากลับมารายงาน namenode เป็นระยะ ๆ กับรายการของบล็อกที่พวกเขาจะเก็บ mapreduce .

mapreduce เป็นกรอบสำหรับการประมวลผลสูงแจกจ่ายปัญหาข้ามขนาดใหญ่ข้อมูลโดยใช้ตัวเลขขนาดใหญ่ของคอมพิวเตอร์ ( โหนด ) โดยรวมเรียกว่ากลุ่ม กรอบที่เป็นแรงบันดาลใจจากแผนที่และลดฟังก์ชันที่ใช้บ่อยในโปรแกรมการทำงาน .

ในขั้นตอน " แผนที่ " อาจารย์โหนดจะใส่พาร์ติชันมันเป็นปัญหาย่อยขนาดเล็ก และกระจายไปยังโหนดคนงาน คนงานโหนดกระบวนการมีปัญหา และผ่านตอบของอาจารย์โหนด ในการ " ลด " ก้าวอาจารย์โหนดแล้วรวบรวมคำตอบของทุกปัญหาย่อยและรวมไว้ในบางวิธีเพื่อสร้างคำตอบของปัญหาเดิมเคยพยายามแก้ปัญหาผลผลิต–

มีสองประเภทของโหนดที่ควบคุมกระบวนการการดำเนินงานนี้ : jobtracker และหมายเลขของ tasktrackers . การ jobtracker พิกัดงานทั้งหมดในระบบ โดยการจัดตารางวิ่งงานวิ่ง tasktrackers .tasktrackers วิ่งงาน และส่งรายงานความก้าวหน้าการ jobtracker ซึ่งเก็บบันทึกความคืบหน้าโดยรวมของแต่ละงาน ถ้างานผิดพลาด jobtracker สามารถเลื่อนบน
tasktracker ต่าง ๆ

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.