3.3. Data Annotation After defining

3.3. Data Annotation
After defining the coding schema, a subset of tweets (5000) is randomly sampled and manually annotated into different themes. During the initial annotation process, we notice that most of the tweets are annotated as the others category, and some categories only contain a very small number of tweets. To ensure that we have enough tweets to build a classification model for the predefined categories, more tweets from each category should be included into the sampling sets which will then be used for the subsequent model training and validation processes. Therefore, an automatic program using a simple text match approach is developed to categorize the remaining tweets into different themes. A tweet is attributed to a specific category if it contains associated keywords defined in Table 1. We look into the tweets of each initial category except for the others category, and annotate those for which we are confident of their true categories, which are then added into our sampling sets. In order to reduce the duplicated tweets on the classifier, all retweets are discarded. In the end, 8807 tweets are included to train and test the multi-label classifier that will be presented in the following section.

3.3. Data Annotation 
After defining the coding schema, a subset of tweets (5000) is randomly sampled and manually annotated into different themes. During the initial annotation process, we notice that most of the tweets are annotated as the others category, and some categories only contain a very small number of tweets. To ensure that we have enough tweets to build a classification model for the predefined categories, more tweets from each category should be included into the sampling sets which will then be used for the subsequent model training and validation processes. Therefore, an automatic program using a simple text match approach is developed to categorize the remaining tweets into different themes. A tweet is attributed to a specific category if it contains associated keywords defined in Table 1. We look into the tweets of each initial category except for the others category, and annotate those for which we are confident of their true categories, which are then added into our sampling sets. In order to reduce the duplicated tweets on the classifier, all retweets are discarded. In the end, 8807 tweets are included to train and test the multi-label classifier that will be presented in the following section.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

3.3. Data Annotation After defining the coding schema, a subset of tweets (5000) is randomly sampled and manually annotated into different themes. During the initial annotation process, we notice that most of the tweets are annotated as the others category, and some categories only contain a very small number of tweets. To ensure that we have enough tweets to build a classification model for the predefined categories, more tweets from each category should be included into the sampling sets which will then be used for the subsequent model training and validation processes. Therefore, an automatic program using a simple text match approach is developed to categorize the remaining tweets into different themes. A tweet is attributed to a specific category if it contains associated keywords defined in Table 1. We look into the tweets of each initial category except for the others category, and annotate those for which we are confident of their true categories, which are then added into our sampling sets. In order to reduce the duplicated tweets on the classifier, all retweets are discarded. In the end, 8807 tweets are included to train and test the multi-label classifier that will be presented in the following section.

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

3.3 หมายเหตุข้อมูล
หลังจากกำหนดสคีรหัสย่อยของทวิตเตอร์ (5000) ที่มีการสุ่มและข้อเขียนด้วยตนเองในรูปแบบที่แตกต่างกัน ในระหว่างกระบวนการบันทึกย่อเริ่มต้นเราสังเกตเห็นว่าส่วนใหญ่ของทวิตเตอร์จะเป็นข้อเขียนหมวดสินค้าอื่น ๆ และบางประเภทมีเพียงจำนวนน้อยมากของทวิตเตอร์ เพื่อให้มั่นใจว่าเรามีทวีตมากพอที่จะสร้างรูปแบบการจำแนกประเภทที่กำหนดไว้ล่วงหน้า, ทวิตเตอร์มากขึ้นจากแต่ละประเภทควรจะรวมเป็นชุดการสุ่มตัวอย่างซึ่งจะนำมาใช้สำหรับการฝึกอบรมและการตรวจสอบรูปแบบกระบวนการที่ตามมา ดังนั้นโปรแกรมอัตโนมัติโดยใช้วิธีการที่ตรงกับข้อความที่เรียบง่ายมีการพัฒนาเพื่อจัดหมวดหมู่ที่เหลือทวีตในรูปแบบที่แตกต่างกัน ทวีประกอบกับประเภทเฉพาะถ้ามีคำหลักที่เกี่ยวข้องกำหนดไว้ในตารางที่ 1 เรามองเข้าไปในทวิตเตอร์ของแต่ละหมวดหมู่เริ่มต้นยกเว้นหมวดสินค้าอื่น ๆ และคำอธิบายเหล่านั้นที่เรามีความมั่นใจในประเภทแท้จริงของพวกเขาที่มีการเพิ่มแล้ว เป็นชุดการสุ่มตัวอย่างของเรา เพื่อลดการทวีตที่ซ้ำกันในลักษณนาม, retweets ทั้งหมดจะถูกยกเลิก ในท้ายที่สุด 8807 ทวิตเตอร์จะรวมอยู่ในการฝึกอบรมและทดสอบลักษณนามหลายป้ายชื่อที่จะนำเสนอในส่วนต่อไปนี้

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

3.3 . หมายเหตุข้อมูลหลังจากกำหนดรหัสสคีมาเป็นเซตย่อยของ tweets ( 5000 ) สุ่มเก็บและตนเองแสดงในรูปแบบที่แตกต่างกัน ในระหว่างกระบวนการหมายเหตุเริ่มต้น เราสังเกตเห็นว่าส่วนใหญ่ของทวีตจะแสดงเป็นคนอื่น ประเภท และบางประเภทเท่านั้น มีจำนวนน้อยมากของข้อความ เพื่อให้แน่ใจว่าเราได้ทวีตพอที่จะสร้างหมวดหมู่สำหรับรูปแบบที่กำหนดไว้ล่วงหน้าประเภท ทวีตเพิ่มเติมจากแต่ละประเภทที่ควรจะรวมอยู่ในชุด ) ซึ่งจะถูกใช้สำหรับกระบวนการภายหลังการฝึกอบรมและการตรวจสอบ ดังนั้น มีโปรแกรมอัตโนมัติโดยใช้ข้อความง่ายๆราคาพัฒนาวิธีการจัดหมวดหมู่ทวีตที่เหลืออยู่ในรูปแบบที่แตกต่างกัน ทวีตจากประเภทเฉพาะในกรณีที่มีการกำหนดตารางที่ 1 เชื่อมโยง เราดูในทวิตเตอร์ของแต่ละคนเริ่มต้นประเภทยกเว้นคนอื่นประเภทและคำอธิบายเหล่านั้นที่เรามั่นใจประเภทที่แท้จริง ซึ่งจะเพิ่มในของเราตัวอย่างชุด เพื่อลดภาพทวีตในประเภท retweets ทั้งหมดจะถูกละทิ้ง ในที่สุด 8807 tweets จะรวมอยู่ในการฝึกและทดสอบแบบหลายป้ายที่จะนำเสนอในส่วนต่อไปนี้

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.