5. Supervised Discretization Method

5. Supervised Discretization Methods
Supervised discretization methods make use of the class label when partitioning the continuous features. Among the supervised discretization methods there are the simple ones like Entropy-based discretization, Interval Merging and Splitting using χ2 Analysis [10].
5.1. Entropy Based Discretization Method
One of the supervised discretization methods, introduced by Fayyad and Irani, is called the entropy-based discretization. An entropy-based method will use the class information entropy of candidate partitions to select boundaries for discretization. Class information entropy is a measure of purity and it measures the amount of information which would be needed to specify to which class an instance belongs. It considers one big interval containing all known values of a feature and then recursively partitions this interval into smaller subintervals until some stopping criterion, for example Minimum Description Length (MDL) Principle or an optimal number of intervals is achieved thus creating multiple intervals of feature. In information theory, the entropy function for a given set S, or the expected information needed to classify a data instance in S, Info(S) is calculated as Info(S) = - Σ pi log2 (pi) Where pi is the probability of class i and is estimated as Ci/S, Ci being the total number of data instances that is of class i. A log function to the base 2 is used because the information is encoded in bits. The entropy value is bounded from below by 0, when the model has no uncertainty at all, i.e. all data instances in S belong to one of the class pi =1, and other classes contain 0 instances pj =0, i≠j. And it is bounded from the top by log2 m, where m is the number of classes in S, i.e. data instances are uniformly distributed across k classes such that pi=1/m for all. Based on this entropy measure, J. Ross Quinlan developed an algorithm called Iterative Dichotomiser 3 (ID3) to induce best split point in decision trees. ID3 employs a greedy search to find potential split-points within the existing range of continuous values using the following formula:
In the equation, pj,left and p j,right are probabilities that an instances, belong to class j, is on the left or right side of a potential split-point T. The split-point with the lowest entropy is chosen to split the range into two intervals, and the binary split is continued with each part until a stopping criterion is satisfied. Fayyad and Irani propose a stopping criterion for this generalization using the minimum description length principle (MDLP) that stops the splitting when InfoGain(S, T) = Info(S) – Info(S, T) < δ Where T is a potential interval boundary that splits S into S1 (left) and S2 (right) parts, and δ = [log2 (n-1) + log2 (3k -2) – [m Info(S) – m1 Info (S1) –m2 Info (S2)]] / n Where mi is the number of classes in each set Si and n is the total number of data instances in S.

5. Supervised Discretization Methods 
Supervised discretization methods make use of the class label when partitioning the continuous features. Among the supervised discretization methods there are the simple ones like Entropy-based discretization, Interval Merging and Splitting using χ2 Analysis [10]. 
5.1. Entropy Based Discretization Method 
One of the supervised discretization methods, introduced by Fayyad and Irani, is called the entropy-based discretization. An entropy-based method will use the class information entropy of candidate partitions to select boundaries for discretization. Class information entropy is a measure of purity and it measures the amount of information which would be needed to specify to which class an instance belongs. It considers one big interval containing all known values of a feature and then recursively partitions this interval into smaller subintervals until some stopping criterion, for example Minimum Description Length (MDL) Principle or an optimal number of intervals is achieved thus creating multiple intervals of feature. In information theory, the entropy function for a given set S, or the expected information needed to classify a data instance in S, Info(S) is calculated as Info(S) = - Σ pi log2 (pi) Where pi is the probability of class i and is estimated as Ci/S, Ci being the total number of data instances that is of class i. A log function to the base 2 is used because the information is encoded in bits. The entropy value is bounded from below by 0, when the model has no uncertainty at all, i.e. all data instances in S belong to one of the class pi =1, and other classes contain 0 instances pj =0, i≠j. And it is bounded from the top by log2 m, where m is the number of classes in S, i.e. data instances are uniformly distributed across k classes such that pi=1/m for all. Based on this entropy measure, J. Ross Quinlan developed an algorithm called Iterative Dichotomiser 3 (ID3) to induce best split point in decision trees. ID3 employs a greedy search to find potential split-points within the existing range of continuous values using the following formula:
In the equation, pj,left and p j,right are probabilities that an instances, belong to class j, is on the left or right side of a potential split-point T. The split-point with the lowest entropy is chosen to split the range into two intervals, and the binary split is continued with each part until a stopping criterion is satisfied. Fayyad and Irani propose a stopping criterion for this generalization using the minimum description length principle (MDLP) that stops the splitting when InfoGain(S, T) = Info(S) – Info(S, T) < δ Where T is a potential interval boundary that splits S into S1 (left) and S2 (right) parts, and δ = [log2 (n-1) + log2 (3k -2) – [m Info(S) – m1 Info (S1) –m2 Info (S2)]] / n Where mi is the number of classes in each set Si and n is the total number of data instances in S.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

5 วิธีการที่ไม่ต่อเนื่องภายใต้การดูแล
วิธีการไม่ต่อเนื่องภายใต้การดูแลให้การใช้ฉลากชั้นเมื่อแบ่งคุณสมบัติอย่างต่อเนื่อง ในวิธีการไม่ต่อเนื่องภายใต้การดูแลมีเป็นคนที่ง่ายๆเช่นเอนโทรปีตาม discretization ผสานช่วงเวลาและแยกใช้χ2วิเคราะห์ [10]
5.1 วิธีเอนโทรปีไม่ต่อเนื่องตาม
หนึ่งในวิธีการไม่ต่อเนื่องภายใต้การดูแล,แนะนำให้รู้จักกับ Fayyad และ irani จะเรียกว่าเอนโทรปีไม่ต่อเนื่องตาม วิธีการที่ใช้เอนโทรปีจะใช้เอนโทรปีข้อมูลการเรียนของพาร์ทิชันที่ผู้สมัครเพื่อเลือกขอบเขตการไม่ต่อเนื่อง เอนโทรปีข้อมูลการเรียนเป็นตัวชี้วัดของความบริสุทธิ์และมาตรการปริมาณของข้อมูลที่จะต้องระบุที่ชั้นเช่นเป็นมันคิดว่าช่วงเวลาที่ยิ่งใหญ่คนหนึ่งที่มีค่าที่รู้จักกันทั้งหมดของคุณสมบัติแล้วซ้ำพาร์ติชันนี้ในช่วงเวลาที่มีขนาดเล็กจน subintervals เกณฑ์การหยุดเช่นระยะเวลาในคำอธิบายขั้นต่ำ (MDL) หลักการหรือจำนวนที่เหมาะสมของช่วงเวลาที่ประสบความสำเร็จดังนั้นการสร้างหลายช่วงของคุณสมบัติ ในทฤษฎีข้อมูลฟังก์ชั่นเอนโทรปีสำหรับการตั้งค่า s ที่กำหนดหรือข้อมูลที่คาดหวังที่จำเป็นในการจัดเช่นข้อมูลใน s, ข้อมูล (s) จะถูกคำนวณเป็นข้อมูล (s) = - Σปี่ log2 (ปี่) ที่ปี่ความน่าจะเป็นของชั้นเรียนของฉันและเป็นที่คาดกันว่าเป็น CI / s CI เป็น จำนวนรวมของข้อมูลกรณีที่เป็นของฉันชั้น ฟังก์ชั่นการเข้าสู่ระบบไปยังฐานที่ 2 ถูกนำมาใช้เนื่องจากข้อมูลจะถูกเข้ารหัสในบิต ค่าเอนโทรปีกระโดดจากด้านล่างโดย 0,เมื่อรูปแบบที่ไม่มีความไม่แน่นอนที่ทุกคนเช่นกรณีข้อมูลทั้งหมดใน s เป็นหนึ่งในชั้นปี่ = 1 และชั้นเรียนอื่น ๆ มี 0 กรณี pj = 0, i ≠ญ และจะมีการกระโดดจากด้านบนโดย log2 เมตรโดยที่ m คือจำนวนของชั้นเรียนใน s เช่นกรณีที่ข้อมูลจะถูกกระจายไปทั่วชั้นเรียน k ดังกล่าวว่าปี่ = 1 / เมตรสำหรับทุก ขึ้นอยู่กับการวัดเอนโทรปีนี้เจross ควินแลนพัฒนาขั้นตอนวิธีการที่เรียกว่าย้ำ dichotomiser 3 (id3) เพื่อก่อให้เกิดการแยกจุดที่ดีที่สุดในต้นไม้ตัดสินใจ id3 พนักงานค้นหาโลภที่จะหาจุดแยกที่มีศักยภาพภายในช่วงที่มีอยู่ของค่าอย่างต่อเนื่องโดยใช้สูตรต่อไปนี้:
ในสมการ PJ ซ้ายและ PJ ขวามีความน่าจะเป็นว่ากรณีที่อยู่ในระดับเจอยู่ด้านซ้ายหรือด้านขวาของเสื้อแยกจุดที่มีศักยภาพ แยกจุดที่มีเอนโทรปีต่ำสุดคือการเลือกที่จะแบ่งช่วงเป็นสองช่วงเวลาและแยกไบนารีเป็นอย่างต่อเนื่องกันเป็นส่วนหนึ่งจนเกณฑ์การหยุดเป็นที่พอใจFayyad และ irani เสนอเกณฑ์การหยุดทั่วไปนี้ใช้ระยะเวลาในคำอธิบายหลักการขั้นต่ำ (MDLP) ที่แยกหยุดเมื่อ infogain (s, t) = ข้อมูล (s) - ข้อมูล (s, t) <δ t ที่เป็นช่วงเวลาที่มีศักยภาพ ขอบเขตที่แยก s เป็น s1 (ซ้าย) และ s2 (ขวา) ชิ้นส่วนและδ = [log2 (n-1) log2 (3k -2) - [ข้อมูลเมตร (s) - ข้อมูล m1 (S1) m2 ข้อมูล (s2)]] / n ที่ไมล์คือจำนวนของชั้นเรียนในแต่ละชุด si และ n คือจำนวนของข้อมูลในกรณีของ

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

5. แบบมีผู้สอนวิธีการ Discretization
discretization Supervised วิธีทำให้ใช้ป้ายชื่อคลาสเมื่อแบ่งพาร์ทิชันลักษณะการทำงานต่อเนื่อง ระหว่าง discretization มี วิธีมีอยู่อย่างที่เช่นตาม Entropy discretization ช่วงผสานและ Splitting โดยใช้การวิเคราะห์ χ2 [10]
5.1 ใช้วิธีการ Discretization เอนโทรปี
discretization มีวิธี นำ โดย Fayyad และ Irani เรียกว่า discretization ใช้เอนโทรปี วิธีการใช้เอนโทรปีจะใช้เอนโทรปีของข้อมูลชั้นกั้นผู้สมัครเพื่อเลือกขอบเขตการ discretization เอนโทรปีของข้อมูลชั้นเรียนเป็นการวัดของความบริสุทธิ์ และจะวัดปริมาณของข้อมูลซึ่งจะต้องระบุเป็นสมาชิกของคลาสอินสแตนซ์ ถือเอาช่วงใหญ่หนึ่งที่ประกอบด้วยค่าที่ทราบทั้งหมดของคุณลักษณะ แล้วกั้น recursively ทำช่วงนี้เป็น subintervals มีขนาดเล็กจนถึงเกณฑ์หยุดบาง เช่นหลักความยาวอธิบายน้อยอเนกประสงค์ (MDL) หรือจำนวนช่วงเวลาที่เหมาะสมจึง สร้างหลายช่วงของคุณลักษณะ ในทฤษฎีสารสนเทศ เอนโทรปีฟังก์ชันสำหรับ S การตั้งค่าที่กำหนด หรือข้อมูลที่คาดว่าจำเป็นในการจัดประเภทข้อมูลอินสแตนซ์ใน S ข้อมูล (S) จะคำนวณเป็นข้อมูล (S) = - Σปี่ log2 (ผี) ความน่าเป็นของพี่เรียนฉัน และประเมินเป็น Ci/S, Ci เป็นอินสแตนซ์จำนวนข้อมูลทั้งหมดที่เป็นของคลาสฉัน ใช้ฟังก์ชันล็อก 2 ฐานเนื่องจากข้อมูลถูกเข้ารหัสบิต ค่าเอนโทรปีจะล้อมรอบจากด้านล่าง โดย 0 เมื่อโมเดลมีความไม่แน่นอนไม่เลย เช่นทั้งหมดข้อมูลกรณีในสมาชิกหนึ่งปี่คลา = 1 และชั้นอื่น ๆ ประกอบด้วยพีเจอินสแตนซ์ 0 = 0, i≠j และมันถูกล้อมรอบจากด้านบน โดย log2 m โดยที่ m คือ จำนวนชั้นใน S เช่นนั้นกรณีได้จะกระจายสม่ำเสมอเมื่อเทียบเคียงเช่นชั้น k พาย = 1/m สำหรับทั้งหมด ตามวัดนี้เอนโทรปี เจ Ross Quinlan พัฒนาอัลกอริทึมการเรียกซ้ำ Dichotomiser 3 (ID3) เพื่อก่อให้เกิดจุดแยกสุดในต้นไม้ตัดสินใจ ID3 ใช้ค้นหาตะกละหาแยกจุดเกิดภายในช่วงที่มีอยู่อย่างต่อเนื่องค่าโดยใช้สูตรต่อไปนี้:
j สมการ พีเจ ซ้าย และ p ขวาเป็นกิจกรรมที่อินสแตนซ์การ เป็นของชั้นเจ อยู่ด้านซ้ายหรือด้านขวาของต.จุดแบ่งเป็น จุดแยกกับเอนโทรปีต่ำสุดคือเลือกที่จะแบ่งช่วงสองช่วง และแบ่งไบนารีต่อไปกับแต่ละส่วนจนกว่าเงื่อนไขการหยุดความพึงพอใจ Fayyad และ Irani เสนอเงื่อนไขการหยุดสำหรับ generalization นี้ใช้หลักความยาวต่ำสุดอธิบาย (MDLP) ที่หยุดแยกเมื่อ InfoGain (S, T) =ข้อมูล (S) – ข้อมูล (S, T) < δ T ที่เป็นขอบเขตช่วงเกิดที่แยก S (ซ้าย) S1 และ S2 ส่วน (ขวา) และδ = [log2 log2 (n-1) (3k -2) - [ข้อมูล (S) – เอ็มเอ็ม 1 (S1) ข้อมูล –m2 ข้อมูล (S2)]] n ที่ mi เป็นจำนวนชั้นในแต่ละชุดซีและ n เป็น จำนวนข้อมูลอินสแตนซ์ใน s ได้

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

5 . มีการตรวจสอบวิธีการ discretization วิธีการ
discretization มีการตรวจสอบทำให้การใช้ label class ที่เมื่อการแบ่งพาร์ติชั่นที่โดดเด่นไปด้วยอย่างต่อเนื่อง ในบรรดาวิธีการ discretization มีการตรวจสอบได้มีคนแบบเรียบง่ายเหมือน discretization Entropy - ใช้ช่วงการควบรวมกิจการและการแยกโดยใช้การวิเคราะห์χ 2 [ 10 ]
5.1 . Entropy discretization ใช้วิธีการ
หนึ่งในวิธีการ discretization มีการตรวจสอบได้เริ่มนำมาใช้โดย fayyad และ irani discretization เรียกว่า Entropy ที่ Entropy เพียงวิธีการที่จะใช้ Entropy ข้อมูลของผู้สมัครรับเลือกตั้งเพื่อเลือกพาร์ติชันขอบเขตสำหรับ discretization Entropy คือการวัดข้อมูลระดับ First Class ที่มีความบริสุทธิ์และวัดปริมาณของข้อมูลที่จะต้องใช้ในการระบุที่อินสแตนซ์ของคลาสที่เป็นของที่เห็นว่าเป็นช่วงเวลาที่มีค่าขนาดใหญ่ที่มีชื่อเสียงทั้งหมดของคุณสมบัติที่แล้ว(% s )พาร์ติชันช่วงนี้เข้ากับ subintervals มีขนาดเล็กลงจนกว่าจะถึงเกณฑ์การหยุดตัวอย่างเช่นบางหลักการอย่างน้อยรายละเอียดความยาว( MDL )หรือจำนวนสูงสุดที่สามารถทำได้ในแต่ละช่วงของการสร้างในแต่ละช่วงหลายแห่งโดดเด่นไปด้วย ในทางทฤษฎีข้อมูลฟังก์ชัน Entropy สำหรับให้ตั้งค่า% sหรือที่คาดว่าจะได้รับข้อมูลที่จำเป็นในการจัด ประเภท ข้อมูลตัวอย่างเช่นใน S ,ข้อมูล( s )จะคำนวณตามข้อมูล( S )= - Σ PI ล็อกอินเข้าสู่ 2 ( pi )ที่ Pi Kitchen Bar คือความเป็นไปของผมและจะเป็นการประเมินและ CI / s , CI เป็นจำนวนของข้อมูลกรณีที่มี Class I . ล็อกอินเข้าสู่ฟังก์ชันที่ 2 ฐานที่มีการใช้เนื่องจากข้อมูลที่ถูกเข้ารหัสในหน่วยบิต มอบความคุ้มค่า Entropy คือกระโจนมาจากด้านล่างโดย 0เมื่อรุ่นนี้มีความไม่แน่นอนไม่มีที่ทั้งหมดนั่นคือกรณีข้อมูลทั้งหมดอยู่ใน S เป็นของหนึ่งใน PI - class ที่= 1 และชั้นเรียนอื่นๆมี 0 กรณี PJ = 0 i≠j และเป็นสัตว์จากด้านบนโดยล็อกอินเข้าสู่ 2 ม.ม.ที่มีจำนวนของชั้นเรียนใน S เช่นกรณีข้อมูลจะมีการจำหน่ายในชั้นเรียน K ที่ PI = 1 ม./สำหรับเครื่องแบบ ซึ่งใช้ในการวัดระดับ Entropy นี้.ศป.รอสซี quinlan พัฒนาอัลกอริธึมที่เรียกว่าฝึกฝนตาม dichotomiser 3 ( ID 3 )ทำให้มีจุดแบ่งออกเป็นการตัดสินใจที่ดีที่สุดในต้นไม้ ID 3 มีพนักงานการค้นหาความ โลภ ที่มี ศักยภาพ ในการค้นหาแบบแยกเป็นจุดในช่วงที่มีอยู่ของค่าอย่างต่อเนื่องโดยใช้สูตรต่อไปนี้:
ในสมการ PJ ด้านซ้ายและ P J ขวามีความน่าจะเป็นว่าบางกรณีที่เป็นของ J - Classอยู่ทางด้านซ้ายหรือด้านขวาของที่มี ศักยภาพ แบบแยกเป็นจุด T ที่แบบแยกเป็นจุดที่ต่ำที่สุดด้วย Entropy คือเลือกที่จะแยกออกได้เป็นสองช่วงช่วงและไบนารีแบ่งออกเป็นอย่างต่อเนื่องด้วยแต่ละส่วนจนกว่าการหยุดเกณฑ์เป็นที่พอใจ.fayyad และ irani เสนอการหยุดเกณฑ์สำหรับนี้พูดคลุมทั่วๆไปโดยใช้หลักการความยาวขั้นต่ำคำอธิบาย( mdlp )ที่จะหยุดการแยกเมื่อ infogain ( S , T )=ข้อมูล( S ) - ข้อมูล( S , T )<Δ T ที่มี ศักยภาพ ช่วงเขตที่แยกไป S 1 (ด้านซ้าย)และ S 2 (ด้านขวา)ชิ้นส่วน,Δและ=[ล็อกอินเข้าสู่ 2 ( N - 1 )ล็อกอินเข้าสู่ 2 ( 3 K - 2 ) - - [ม. info ( S ) - ม. 1 ( S 1 ) - ม. 2 ( S 2 )]]/ n ที่อยู่ห่างออกไปในระยะทางที่มีหมายเลขของชั้นเรียนในแต่ละแห่งตั้งอยู่ศรีและ n คือจำนวนของข้อมูลในบางกรณี. S . S .

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.