The clustering model most closely r

The clustering model most closely related to statistics is based on distribution models. Clusters can then easily be defined as objects belonging most likely to the same distribution. A convenient property of this approach is that this closely resembles the way artificial data sets are generated: by sampling random objects from a distribution.

While the theoretical foundation of these methods is excellent, they suffer from one key problem known as overfitting, unless constraints are put on the model complexity. A more complex model will usually be able to explain the data better, which makes choosing the appropriate model complexity inherently difficult.

One prominent method is known as Gaussian mixture models (using the expectation-maximization algorithm). Here, the data set is usually modelled with a fixed (to avoid overfitting) number of Gaussian distributions that are initialized randomly and whose parameters are iteratively optimized to fit better to the data set. This will converge to a local optimum, so multiple runs may produce different results. In order to obtain a hard clustering, objects are often then assigned to the Gaussian distribution they most likely belong to; for soft clusterings, this is not necessary.

Distribution-based clustering produces complex models for clusters that can capture correlation and dependence between attributes. However, these algorithms put an extra burden on the user: for many real data sets, there may be no concisely defined mathematical model (e.g. assuming Gaussian distributions is a rather strong assumption on the data).

While the theoretical foundation of these methods is excellent, they suffer from one key problem known as overfitting, unless constraints are put on the model complexity. A more complex model will usually be able to explain the data better, which makes choosing the appropriate model complexity inherently difficult.

One prominent method is known as Gaussian mixture models (using the expectation-maximization algorithm). Here, the data set is usually modelled with a fixed (to avoid overfitting) number of Gaussian distributions that are initialized randomly and whose parameters are iteratively optimized to fit better to the data set. This will converge to a local optimum, so multiple runs may produce different results. In order to obtain a hard clustering, objects are often then assigned to the Gaussian distribution they most likely belong to; for soft clusterings, this is not necessary.

Distribution-based clustering produces complex models for clusters that can capture correlation and dependence between attributes. However, these algorithms put an extra burden on the user: for many real data sets, there may be no concisely defined mathematical model (e.g. assuming Gaussian distributions is a rather strong assumption on the data).

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

แบบระบบคลัสเตอร์ที่สุดเกี่ยวข้องกับสถิติขึ้นอยู่กับรูปแบบการกระจาย คลัสเตอร์สามารถแล้วได้กำหนดเป็นวัตถุที่เป็นของการแจกแจงแบบเดียวกันจะ คุณสมบัติของวิธีการนี้จึงเป็นว่า นี้อย่างใกล้ชิดคล้ายกับวิธีสร้างชุดข้อมูลประดิษฐ์: โดยวัตถุสุ่มจากการกระจายการสุ่มตัวอย่างในขณะที่ทฤษฎีรากฐานของวิธีการเหล่านี้เป็นเลิศ พวกเขาประสบจากปัญหาสำคัญหนึ่งที่เรียกว่า overfitting เว้นแต่ข้อจำกัดจะใส่ความซับซ้อนของรูปแบบการ แบบซับซ้อนมักจะสามารถอธิบายข้อมูลได้ดีขึ้น ซึ่งทำให้เลือกรูปแบบที่เหมาะสมความซับซ้อนความยากวิธีหนึ่งโดดเด่นมีชื่อเสียงเป็นรูปแบบ Gaussian ผสม (ใช้อัลกอริทึมความ maximization) ที่นี่ ชุดข้อมูลเป็นปกติคือ แบบจำลอง มีสินทรัพย์ถาวร (เพื่อหลีกเลี่ยง overfitting) จำนวน Gaussian กระจายที่จะเริ่มต้นโดยการสุ่ม และพารามิเตอร์จะเหมาะให้พอดีดีกว่าชุดข้อมูลซ้ำ ๆ นี้จะมาบรรจบกันให้เหมาะสมกับท้องถิ่น เพื่อทำหลายอาจให้ผลลัพธ์ที่แตกต่างกัน เพื่อรับการคลัสเตอร์ฮาร์ดดิสก์ วัตถุแล้วมักจะกำหนดให้แจก Gaussian ที่พวกเขาจะเป็น clusterings นุ่ม นี้ไม่จำเป็นแจกจ่ายตามคลัสเตอร์สร้างรูปแบบที่ซับซ้อนสำหรับคลัสเตอร์ที่สามารถจับภาพความสัมพันธ์และพึ่งพาระหว่างแอททริบิวต์ อย่างไรก็ตาม อัลกอริทึมเหล่านี้ใส่เหมาะสมผู้ใช้: การตั้งค่าข้อมูลจริงมาก อาจมีไม่ concisely กำหนดแบบจำลองทางคณิตศาสตร์ (เช่นสมมติว่า อัสสัมชัญที่ค่อนข้างแข็งแรงกับข้อมูลมีการกระจาย Gaussian) ได้

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

เป็นแบบจำลองอย่างใกล้ชิดมากที่สุดที่เกี่ยวข้องกับสถิติ จะขึ้นอยู่กับรูปแบบการกระจาย กลุ่มนั้นสามารถถูกกำหนดเป็นวัตถุที่ส่วนใหญ่มีแนวโน้มที่จะกระจายเหมือนกัน คุณสมบัติที่สะดวกของวิธีการนี้คือวิธีนี้คล้ายกับชุดข้อมูลประดิษฐ์ขึ้น : โดยสุ่มวัตถุแบบสุ่มจากการแจกแจง .

ในขณะที่ทฤษฎีพื้นฐานของวิธีการเหล่านี้เป็นเลิศ พวกเขาประสบจากปัญหาหนึ่งที่สำคัญที่รู้จักกันเป็น overfitting เว้นแต่ข้อจำกัดที่จะใส่ในรูปแบบซับซ้อน รูปแบบซับซ้อนมากขึ้นมักจะสามารถอธิบายข้อมูลที่ดีขึ้น ซึ่งทำให้การเลือกรูปแบบที่เหมาะสมที่ซับซ้อนโดยเนื้อแท้ยาก

ที่โดดเด่นวิธีการหนึ่งเป็นที่รู้จักกันเป็นรุ่นผสม ) ( ใช้ความคาดหวัง ( อัลกอริทึม ) ที่นี่ชุดข้อมูลมักจะจำลองด้วยค่าคงที่ ( เพื่อหลีกเลี่ยง overfitting ) จำนวนของการกระจาย Gaussian ที่เริ่มต้นแบบสุ่มและที่มีค่าซ้ำเหมาะพอดี ดีกว่าให้กับชุดข้อมูล นี้จะบรรจบกับท้องถิ่นที่เหมาะสมดังนั้นหลายวิ่งอาจให้ผลลัพธ์ที่แตกต่างกัน เพื่อที่จะได้รับยากสำหรับวัตถุที่มีแล้วมักจะมอบหมายให้เสียนกระจายพวกเขาส่วนใหญ่เป็นของ เพราะนุ่ม clusterings นี่ ไม่จำเป็น

แจกตามการจัดกลุ่มผลิตซับซ้อนแบบจำลองคลัสเตอร์ที่สามารถจับภาพสหสัมพันธ์ระหว่างคุณลักษณะ อย่างไรก็ตามขั้นตอนวิธีการเหล่านี้วางภาระพิเศษในผู้ใช้ : สำหรับชุดข้อมูลหลาย อาจไม่มีระบบกำหนดแบบจำลองทางคณิตศาสตร์ ( เช่น สมมติว่าเกาส์การแจกแจงเป็นอัสสัมชัญค่อนข้างแรงในข้อมูลค่ะ

)

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.