5. Supervised Discretization Method

5. Supervised Discretization Methods
Supervised discretization methods make use of the class label when partitioning the continuous features. Among the supervised discretization methods there are the simple ones like Entropy-based discretization, Interval Merging and Splitting using χ2 Analysis [10].
5.1. Entropy Based Discretization Method
One of the supervised discretization methods, introduced by Fayyad and Irani, is called the entropy- based discretization. An entropy-based method will use the class information entropy of candidate partitions to select boundaries for discretization. Class information entropy is a measure of purity and it measures the amount of information which would be needed to specify to which class an instance belongs. It considers one big interval containing all known values of a feature and then recursively partitions this interval into smaller subintervals until some stopping criterion, for example Minimum Description Length (MDL) Principle or an optimal number of intervals is achieved thus creating multiple intervals of feature. In information theory, the entropy function for a given set S, or the expected information needed to classify a data instance in S, Info(S) is calculated as Info(S) = - Σ pi log2 (pi) Where pi is the probability of class i and is estimated as Ci/S, Ci being the total number of data instances that is of class i. A log function to the base 2 is used because the information is encoded in bits. The entropy value is bounded from below by 0, when the model has no uncertainty at all, i.e. all data instances in S belong to one of the class pi =1, and other classes contain 0 instances pj =0, i≠j. And it is bounded from the top by log2 m, where m is the number of classes in S, i.e. data instances are uniformly distributed across k classes such that pi=1/m for all. Based on this entropy measure, J. Ross Quinlan developed an algorithm called Iterative Dichotomiser 3 (ID3) to induce best split point in decision trees. ID3 employs a greedy search to find potential split-points within the existing range of continuous values using the following formula:

5. Supervised Discretization Methods 
Supervised discretization methods make use of the class label when partitioning the continuous features. Among the supervised discretization methods there are the simple ones like Entropy-based discretization, Interval Merging and Splitting using χ2 Analysis [10]. 
5.1. Entropy Based Discretization Method 
One of the supervised discretization methods, introduced by Fayyad and Irani, is called the entropy- based discretization. An entropy-based method will use the class information entropy of candidate partitions to select boundaries for discretization. Class information entropy is a measure of purity and it measures the amount of information which would be needed to specify to which class an instance belongs. It considers one big interval containing all known values of a feature and then recursively partitions this interval into smaller subintervals until some stopping criterion, for example Minimum Description Length (MDL) Principle or an optimal number of intervals is achieved thus creating multiple intervals of feature. In information theory, the entropy function for a given set S, or the expected information needed to classify a data instance in S, Info(S) is calculated as Info(S) = - Σ pi log2 (pi) Where pi is the probability of class i and is estimated as Ci/S, Ci being the total number of data instances that is of class i. A log function to the base 2 is used because the information is encoded in bits. The entropy value is bounded from below by 0, when the model has no uncertainty at all, i.e. all data instances in S belong to one of the class pi =1, and other classes contain 0 instances pj =0, i≠j. And it is bounded from the top by log2 m, where m is the number of classes in S, i.e. data instances are uniformly distributed across k classes such that pi=1/m for all. Based on this entropy measure, J. Ross Quinlan developed an algorithm called Iterative Dichotomiser 3 (ID3) to induce best split point in decision trees. ID3 employs a greedy search to find potential split-points within the existing range of continuous values using the following formula:

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

5 วิธีการที่ไม่ต่อเนื่องภายใต้การดูแล
วิธีการไม่ต่อเนื่องภายใต้การดูแลให้การใช้ฉลากชั้นเมื่อแบ่งคุณสมบัติอย่างต่อเนื่อง ในวิธีการไม่ต่อเนื่องภายใต้การดูแลมีเป็นคนที่ง่ายๆเช่นเอนโทรปีตาม discretization ผสานช่วงเวลาและแยกใช้χ2วิเคราะห์ [10]
5.1 วิธีเอนโทรปีไม่ต่อเนื่องตาม
หนึ่งในวิธีการไม่ต่อเนื่องภายใต้การดูแล,แนะนำให้รู้จักกับ Fayyad และ irani จะเรียกว่าเอนโทรปีไม่ต่อเนื่องตาม วิธีการที่ใช้เอนโทรปีจะใช้เอนโทรปีข้อมูลการเรียนของพาร์ทิชันที่ผู้สมัครเพื่อเลือกขอบเขตการไม่ต่อเนื่อง เอนโทรปีข้อมูลการเรียนเป็นตัวชี้วัดของความบริสุทธิ์และมาตรการปริมาณของข้อมูลที่จะต้องระบุที่ชั้นเช่นเป็นมันคิดว่าช่วงเวลาที่ยิ่งใหญ่คนหนึ่งที่มีค่าที่รู้จักกันทั้งหมดของคุณสมบัติแล้วซ้ำพาร์ติชันนี้ในช่วงเวลาที่มีขนาดเล็กจน subintervals เกณฑ์การหยุดเช่นระยะเวลาในคำอธิบายขั้นต่ำ (MDL) หลักการหรือจำนวนที่เหมาะสมของช่วงเวลาที่ประสบความสำเร็จดังนั้นการสร้างหลายช่วงของคุณสมบัติ ในทฤษฎีข้อมูลฟังก์ชั่นเอนโทรปีสำหรับการตั้งค่า s ที่กำหนดหรือข้อมูลที่คาดหวังที่จำเป็นในการจัดเช่นข้อมูลใน s, ข้อมูล (s) จะถูกคำนวณเป็นข้อมูล (s) = - Σปี่ log2 (ปี่) ที่ปี่ความน่าจะเป็นของชั้นเรียนของฉันและเป็นที่คาดกันว่าเป็น CI / s CI เป็น จำนวนรวมของข้อมูลกรณีที่เป็นของฉันชั้น ฟังก์ชั่นการเข้าสู่ระบบไปยังฐานที่ 2 ถูกนำมาใช้เนื่องจากข้อมูลจะถูกเข้ารหัสในบิต ค่าเอนโทรปีกระโดดจากด้านล่างโดย 0,เมื่อรูปแบบที่ไม่มีความไม่แน่นอนที่ทุกคนเช่นกรณีข้อมูลทั้งหมดใน s เป็นหนึ่งในชั้นปี่ = 1 และชั้นเรียนอื่น ๆ มี 0 กรณี pj = 0, i ≠ญ และจะมีการกระโดดจากด้านบนโดย log2 เมตรโดยที่ m คือจำนวนของชั้นเรียนใน s เช่นกรณีที่ข้อมูลจะถูกกระจายไปทั่วชั้นเรียน k ดังกล่าวว่าปี่ = 1 / เมตรสำหรับทุก ขึ้นอยู่กับการวัดเอนโทรปีนี้เจross ควินแลนพัฒนาขั้นตอนวิธีการที่เรียกว่าย้ำ dichotomiser 3 (id3) เพื่อก่อให้เกิดการแยกจุดที่ดีที่สุดในต้นไม้ตัดสินใจ id3 พนักงานค้นหาโลภที่จะหาจุดแยกที่มีศักยภาพภายในช่วงที่มีอยู่ของค่าอย่างต่อเนื่องโดยใช้สูตรต่อไปนี้

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

5. แบบมีผู้สอนวิธีการ Discretization
discretization Supervised วิธีทำให้ใช้ป้ายชื่อคลาสเมื่อแบ่งพาร์ทิชันลักษณะการทำงานต่อเนื่อง ระหว่าง discretization มี วิธีมีอยู่อย่างที่เช่นตาม Entropy discretization ช่วงผสานและ Splitting โดยใช้การวิเคราะห์ χ2 [10]
5.1 ใช้วิธีการ Discretization เอนโทรปี
discretization มีวิธี นำ โดย Fayyad และ Irani เรียกว่า discretization เอนโทรปี-ใช้ วิธีการใช้เอนโทรปีจะใช้เอนโทรปีของข้อมูลชั้นกั้นผู้สมัครเพื่อเลือกขอบเขตการ discretization เอนโทรปีของข้อมูลชั้นเรียนเป็นการวัดของความบริสุทธิ์ และจะวัดปริมาณของข้อมูลซึ่งจะต้องระบุเป็นสมาชิกของคลาสอินสแตนซ์ ถือเอาช่วงใหญ่หนึ่งที่ประกอบด้วยค่าที่ทราบทั้งหมดของคุณลักษณะ แล้วกั้น recursively ทำช่วงนี้เป็น subintervals มีขนาดเล็กจนถึงเกณฑ์หยุดบาง เช่นหลักความยาวอธิบายน้อยอเนกประสงค์ (MDL) หรือจำนวนช่วงเวลาที่เหมาะสมจึง สร้างหลายช่วงของคุณลักษณะ ในทฤษฎีสารสนเทศ เอนโทรปีฟังก์ชันสำหรับ S การตั้งค่าที่กำหนด หรือข้อมูลที่คาดว่าจำเป็นในการจัดประเภทข้อมูลอินสแตนซ์ใน S ข้อมูล (S) จะคำนวณเป็นข้อมูล (S) = - Σปี่ log2 (ผี) ความน่าเป็นของพี่เรียนฉัน และประเมินเป็น Ci/S, Ci เป็นอินสแตนซ์จำนวนข้อมูลทั้งหมดที่เป็นของคลาสฉัน ใช้ฟังก์ชันล็อก 2 ฐานเนื่องจากข้อมูลถูกเข้ารหัสบิต ค่าเอนโทรปีจะล้อมรอบจากด้านล่าง โดย 0 เมื่อโมเดลมีความไม่แน่นอนไม่เลย เช่นทั้งหมดข้อมูลกรณีในสมาชิกหนึ่งปี่คลา = 1 และชั้นอื่น ๆ ประกอบด้วยพีเจอินสแตนซ์ 0 = 0, i≠j และมันถูกล้อมรอบจากด้านบน โดย log2 m โดยที่ m คือ จำนวนชั้นใน S เช่นนั้นกรณีได้จะกระจายสม่ำเสมอเมื่อเทียบเคียงเช่นชั้น k พาย = 1/m สำหรับทั้งหมด ตามวัดนี้เอนโทรปี เจ Ross Quinlan พัฒนาอัลกอริทึมการเรียกซ้ำ Dichotomiser 3 (ID3) เพื่อก่อให้เกิดจุดแยกสุดในต้นไม้ตัดสินใจ ID3 ใช้ค้นหาตะกละหาแยกจุดเกิดภายในช่วงที่มีอยู่อย่างต่อเนื่องค่าโดยใช้สูตรต่อไปนี้:

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

5 . มีการตรวจสอบวิธีการ discretization วิธีการ
discretization มีการตรวจสอบทำให้การใช้ label class ที่เมื่อการแบ่งพาร์ติชั่นที่โดดเด่นไปด้วยอย่างต่อเนื่อง ในบรรดาวิธีการ discretization มีการตรวจสอบได้มีคนแบบเรียบง่ายเหมือน discretization Entropy - ใช้ช่วงการควบรวมกิจการและการแยกโดยใช้การวิเคราะห์χ 2 [ 10 ]
5.1 . Entropy discretization ใช้วิธีการ
หนึ่งในวิธีการ discretization มีการตรวจสอบได้เริ่มนำมาใช้โดย fayyad และ irani เรียกว่า discretization Entropy - ใช้ได้ Entropy เพียงวิธีการที่จะใช้ Entropy ข้อมูลของผู้สมัครรับเลือกตั้งเพื่อเลือกพาร์ติชันขอบเขตสำหรับ discretization Entropy คือการวัดข้อมูลระดับ First Class ที่มีความบริสุทธิ์และวัดปริมาณของข้อมูลที่จะต้องใช้ในการระบุที่อินสแตนซ์ของคลาสที่เป็นของที่เห็นว่าเป็นช่วงเวลาที่มีค่าขนาดใหญ่ที่มีชื่อเสียงทั้งหมดของคุณสมบัติที่แล้ว(% s )พาร์ติชันช่วงนี้เข้ากับ subintervals มีขนาดเล็กลงจนกว่าจะถึงเกณฑ์การหยุดตัวอย่างเช่นบางหลักการอย่างน้อยรายละเอียดความยาว( MDL )หรือจำนวนสูงสุดที่สามารถทำได้ในแต่ละช่วงของการสร้างในแต่ละช่วงหลายแห่งโดดเด่นไปด้วย ในทางทฤษฎีข้อมูลฟังก์ชัน Entropy สำหรับให้ตั้งค่า% sหรือที่คาดว่าจะได้รับข้อมูลที่จำเป็นในการจัด ประเภท ข้อมูลตัวอย่างเช่นใน S ,ข้อมูล( s )จะคำนวณตามข้อมูล( S )= - Σ PI ล็อกอินเข้าสู่ 2 ( pi )ที่ Pi Kitchen Bar คือความเป็นไปของผมและจะเป็นการประเมินและ CI / s , CI เป็นจำนวนของข้อมูลกรณีที่มี Class I . ล็อกอินเข้าสู่ฟังก์ชันที่ 2 ฐานที่มีการใช้เนื่องจากข้อมูลที่ถูกเข้ารหัสในหน่วยบิต มอบความคุ้มค่า Entropy คือกระโจนมาจากด้านล่างโดย 0เมื่อรุ่นนี้มีความไม่แน่นอนไม่มีที่ทั้งหมดนั่นคือกรณีข้อมูลทั้งหมดอยู่ใน S เป็นของหนึ่งใน PI - class ที่= 1 และชั้นเรียนอื่นๆมี 0 กรณี PJ = 0 i≠j และเป็นสัตว์จากด้านบนโดยล็อกอินเข้าสู่ 2 ม.ม.ที่มีจำนวนของชั้นเรียนใน S เช่นกรณีข้อมูลจะมีการจำหน่ายในชั้นเรียน K ที่ PI = 1 ม./สำหรับเครื่องแบบ ซึ่งใช้ในการวัดระดับ Entropy นี้.ศป.รอสซี quinlan พัฒนาอัลกอริธึมที่เรียกว่าฝึกฝนตาม dichotomiser 3 ( ID 3 )ทำให้มีจุดแบ่งออกเป็นการตัดสินใจที่ดีที่สุดในต้นไม้ ID 3 มีพนักงานการค้นหาความ โลภ ที่มี ศักยภาพ ในการค้นหาแบบแยกเป็นจุด ภายใน ช่วงที่มีอยู่ของค่าอย่างต่อเนื่องโดยใช้สูตรต่อไปนี้:

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.