State-of-the-art speaker recognitio

State-of-the-art speaker recognition systems tend to use only
short-term spectral features as voice information. Spectral
parameters take into account some aspects of the acoustic level
of the signal, like spectral magnitudes, formant frequencies, etc.,
and they are highly related to the physical traits of the speaker.
However, humans tend to use several linguistic levels like
lexicon, prosody or phonetics to recognise others with voice.
These levels of information are more related to learned habits or
style, and they are mainly manifested in the dialect, sociolect or
idiolect of the speaker.
Since these linguistic levels play an important role in the
human recognition process, a lot of effort has been placed in
adding this kind of information to automatic speaker recognition
systems. [1] showed that idiolectal information provided a good
recognition performance given a sufficient amount of data, and
more recent works [2-4] have demonstrated that prosody helps
to improve voice spectrum based recognition systems, supplying
complementary information not captured in the traditional
acoustic systems. Moreover, some of these parameters have the
advantage of being more robust to some common problems like
noise, transmission channel, speech level or distance between
the speaker and the microphone than spectral features.
There are probably many more characteristics which may
provide complementary information and should be of a great
value for speaker recognition. This work focuses on the use of
jitter and shimmer for a speaker verification system. Jitter and
shimmer are acoustic characteristics of voice signals, and they
are quantified as the cycle-to-cycle variations of fundamental
frequency and waveform amplitude, respectively. Both features
have been largely used to detect voice pathologies (see, e.g. [5,
6]). They are commonly measured for long sustained vowels,
and values of jitter and shimmer above a certain threshold are
considered being related to pathological voices, which are
usually perceived by humans as breathy, rough or hoarse voices.
In [7] it was reported that significant differences can occur in
jitter and shimmer measurements between different speaking
styles, especially in shimmer measurement. Nevertheless,
prosody is also highly-dependant on the emotion of the speaker,
and prosodic features are useful in automatic recognition
systems even when no emotional state is distinguished.
The aim of this work is to improve a prosodic and voice
spectral verification system by introducing new features based
on jitter and shimmer measurements. The experiments have
been done over the Switchboard-I conversational speech
database. Fusion of different features has been performed at the
score level by using z-score normalization and matcher
weighting fusion method.
This paper is organised as follows. In the next section, an
overview of the features used in this work is presented,
including a description of jitter and shimmer measurements. The
experimental setup and verification experiments are shown in
section 3. Finally, conclusions of the experiments are given in
section 4

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

State-of-the-art speaker recognition systems tend to use onlyshort-term spectral features as voice information. Spectralparameters take into account some aspects of the acoustic levelof the signal, like spectral magnitudes, formant frequencies, etc.,and they are highly related to the physical traits of the speaker.However, humans tend to use several linguistic levels likelexicon, prosody or phonetics to recognise others with voice.These levels of information are more related to learned habits orstyle, and they are mainly manifested in the dialect, sociolect oridiolect of the speaker.Since these linguistic levels play an important role in thehuman recognition process, a lot of effort has been placed inadding this kind of information to automatic speaker recognitionsystems. [1] showed that idiolectal information provided a goodrecognition performance given a sufficient amount of data, andmore recent works [2-4] have demonstrated that prosody helpsto improve voice spectrum based recognition systems, supplyingcomplementary information not captured in the traditionalacoustic systems. Moreover, some of these parameters have theadvantage of being more robust to some common problems likenoise, transmission channel, speech level or distance betweenthe speaker and the microphone than spectral features.There are probably many more characteristics which mayprovide complementary information and should be of a greatvalue for speaker recognition. This work focuses on the use ofjitter and shimmer for a speaker verification system. Jitter andshimmer are acoustic characteristics of voice signals, and theyare quantified as the cycle-to-cycle variations of fundamentalfrequency and waveform amplitude, respectively. Both featureshave been largely used to detect voice pathologies (see, e.g. [5,6]). They are commonly measured for long sustained vowels,and values of jitter and shimmer above a certain threshold areconsidered being related to pathological voices, which areusually perceived by humans as breathy, rough or hoarse voices.In [7] it was reported that significant differences can occur injitter and shimmer measurements between different speakingstyles, especially in shimmer measurement. Nevertheless,prosody is also highly-dependant on the emotion of the speaker,and prosodic features are useful in automatic recognitionsystems even when no emotional state is distinguished.The aim of this work is to improve a prosodic and voicespectral verification system by introducing new features basedon jitter and shimmer measurements. The experiments havebeen done over the Switchboard-I conversational speechdatabase. Fusion of different features has been performed at thescore level by using z-score normalization and matcherweighting fusion method.This paper is organised as follows. In the next section, anoverview of the features used in this work is presented,including a description of jitter and shimmer measurements. Theexperimental setup and verification experiments are shown insection 3. Finally, conclusions of the experiments are given insection 4

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

state-of-the-Art
ระบบลำโพงได้รับการยอมรับมีแนวโน้มที่จะใช้เพียงคุณสมบัติสเปกตรัมระยะสั้นเป็นข้อมูลเสียง ผีพารามิเตอร์คำนึงถึงลักษณะบางอย่างของระดับเสียงของสัญญาณเช่นเคาะสเปกตรัมความถี่ฟอร์แมนและอื่นๆและพวกเขามีความสัมพันธ์อย่างมากที่จะลักษณะทางกายภาพของลำโพง. แต่มนุษย์มีแนวโน้มที่จะใช้ระดับภาษาต่างๆเช่นพจนานุกรมฉันทลักษณ์หรือการออกเสียงที่จะยอมรับคนอื่น ๆ ด้วยเสียง. ระดับเหล่านี้ของข้อมูลที่มีความเกี่ยวข้องกับพฤติกรรมการเรียนรู้หรือรูปแบบและพวกเขาจะประจักษ์ส่วนใหญ่ในภาษา sociolect หรือ idiolect ของลำโพง. ตั้งแต่ระดับภาษาเหล่านี้มีบทบาทสำคัญในการรับรู้ของมนุษย์กระบวนการมากของความพยายามได้ถูกวางไว้ในการเพิ่มชนิดของข้อมูลนี้ได้รับการยอมรับลำโพงอัตโนมัติระบบ [1] พบว่าข้อมูล idiolectal ให้ดีประสิทธิภาพการรับรู้ที่ได้รับในปริมาณที่เพียงพอของข้อมูลและอื่นๆ อีกมากมายผลงานล่าสุด [2-4] แสดงให้เห็นว่าฉันทลักษณ์จะช่วยในการปรับปรุงคลื่นความถี่เสียงที่ใช้ระบบการรับรู้, การจัดหาข้อมูลที่สมบูรณ์ไม่ได้บันทึกในแบบอะคูสติกระบบ นอกจากนี้บางส่วนของพารามิเตอร์เหล่านี้มีข้อได้เปรียบของการเป็นที่มีประสิทธิภาพมากขึ้นในการปัญหาบางอย่างที่เหมือนกันเช่นเสียงช่องทางส่งผ่านระดับพูดหรือระยะห่างระหว่างลำโพงและไมโครโฟนกว่าคุณสมบัติสเปกตรัม. อาจมีลักษณะอื่น ๆ อีกมากมายซึ่งอาจให้ข้อมูลที่สมบูรณ์และควรเป็นของที่ดีคุ้มค่าต่อการรับรู้ของลำโพง งานนี้มุ่งเน้นไปที่การใช้งานของกระวนกระวายใจและระยับสำหรับระบบการตรวจสอบลำโพง กระวนกระวายใจและระยับเป็นลักษณะอะคูสติกของสัญญาณเสียงและพวกเขาจะวัดเป็นรูปแบบวงจรการต่อวงจรของพื้นฐานความถี่และความกว้างรูปแบบของคลื่นตามลำดับ คุณสมบัติทั้งสองได้รับส่วนใหญ่ใช้ในการตรวจสอบโรคเสียง (ดูเช่น [5 6]) พวกเขาจะวัดกันทั่วไปสำหรับสระยั่งยืนนานและค่านิยมของกระวนกระวายใจและระยับเหนือเกณฑ์ที่กำหนดจะได้รับการพิจารณาที่เกี่ยวข้องกับเสียงพยาธิวิทยาซึ่งเป็นที่รับรู้โดยปกติมนุษย์เป็นลมหายใจเสียงขรุขระหรือแหบ. ใน [7] มีรายงานว่ามีนัยสำคัญ ความแตกต่างที่อาจเกิดขึ้นในการวัดและกระวนกระวายใจระยับระหว่างการพูดที่แตกต่างกันรูปแบบโดยเฉพาะอย่างยิ่งในการวัดระยับ อย่างไรก็ตามฉันทลักษณ์นี้ยังมีสูงขึ้นอยู่กับอารมณ์ของลำโพง, และคุณสมบัติฉันทลักษณ์มีประโยชน์ในการรับรู้โดยอัตโนมัติระบบแม้ในขณะที่ไม่มีสภาวะอารมณ์มีความโดดเด่น. จุดมุ่งหมายของงานนี้คือการปรับปรุงฉันทลักษณ์และเสียงระบบการตรวจสอบรางโดยการแนะนำใหม่คุณสมบัติพื้นฐานของการวัดและกระวนกระวายใจระยับ การทดลองได้รับการดำเนินการที่ผ่านสวิตช์-I การพูดการสนทนาฐานข้อมูล ฟิวชั่นที่แตกต่างกันของคุณสมบัติที่ได้รับการดำเนินการในระดับคะแนนโดยใช้บรรทัดฐาน Z-คะแนนและ Matcher วิธีการฟิวชั่นน้ำหนัก. กระดาษนี้จะมีการจัดระเบียบดังต่อไปนี้ ในส่วนถัดไปที่ภาพรวมของคุณสมบัติที่ใช้ในงานนี้จะนำเสนอรวมทั้งรายละเอียดของกระวนกระวายใจและการวัดระยับ การติดตั้งทดลองและการทดลองการตรวจสอบที่แสดงอยู่ในส่วนที่ 3 สุดท้ายข้อสรุปของการทดลองจะได้รับในส่วนที่4

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

สถานะของศิลปะระบบการรู้จำผู้พูดมักใช้เพียง
ระยะสั้นสเปกตรัมคุณสมบัติเป็นข้อมูลเสียง พารามิเตอร์การ
พิจารณาลักษณะบางอย่างของระดับเสียง
ของสัญญาณ เช่น การประมาณความถี่เสียง ฯลฯ
, , และพวกเขาจะสูงมีความสัมพันธ์กับลักษณะทางกายภาพของลำโพง
อย่างไรก็ตาม มนุษย์มีแนวโน้มที่จะใช้ระดับภาษาหลายเช่นพจนานุกรม
,ฉันทลักษณ์หรือสัทศาสตร์ยอมรับผู้อื่นด้วยเสียง .
เหล่านี้ระดับของข้อมูลเกี่ยวข้องเพิ่มเติมเพื่อเรียนรู้นิสัยหรือ
สไตล์ และส่วนใหญ่จะประจักษ์ในภาษาถิ่น sociolect หรือ

idiolect ของลำโพง ตั้งแต่ระดับภาษาเหล่านี้มีบทบาทสำคัญในกระบวนการรับรู้
มนุษย์ ความพยายามมากมายได้ถูกวางไว้ในการเพิ่มข้อมูลประเภทนี้

ลำโพงรับรู้โดยอัตโนมัติระบบ [ 1 ] พบข้อมูลที่ idiolectal ให้ดี
รับรู้ประสิทธิภาพได้รับปริมาณที่เพียงพอของข้อมูลและ
ล่าสุดผลงาน [ 2-4 ] พบว่าฉันทลักษณ์ช่วย
ปรับปรุงเสียงตามระบบสเปกตรัมการจัดหา
ข้อมูลประกอบไม่จับในแบบดั้งเดิม
เสียงระบบ นอกจากนี้บางส่วนของพารามิเตอร์เหล่านี้มี
ประโยชน์ของการเพิ่มช่องทางการส่งที่แข็งแกร่งกับบางปัญหาทั่วไปเช่น
เสียงในการพูดระดับหรือระยะห่างระหว่างลำโพงและไมโครโฟนกว่า

มีคุณลักษณะการมองเห็น อาจจะอีกหลายลักษณะซึ่งอาจ
ให้ข้อมูลที่สมบูรณ์และควรจะคุ้มค่า
ลำโพงสำหรับการรับรู้ งานนี้จะเน้นที่การใช้
โดย Shimmer สำหรับตรวจสอบระบบลำโพง โดย
Shimmer เป็นลักษณะสัญญาณเสียง , และพวกเขา
เป็นวัดเป็นวงจร วงจรการเปลี่ยนแปลงของความถี่และแอมปลิจูด
รูปตามลำดับ ทั้งสองลักษณะ
ได้รับไปใช้ในการตรวจสอบเสียงโรค ( เช่น [ 5
6 ] ) พวกเขามักวัดนานยั่งยืน
สระและคุณค่าของการส่งระยับและสูงกว่าเกณฑ์บางอย่างที่เกี่ยวข้องกับ
ถือว่าเสียงทางพยาธิวิทยา ซึ่งเป็นปกติของมนุษย์ เช่น เหม็น
หยาบหรือแหบเสียง
[ 7 ] มีรายงานที่แตกต่างสามารถเกิดขึ้นได้โดยการวัดแสงระยับระหว่าง

ลักษณะการพูดที่แตกต่างกัน โดยเฉพาะอย่างยิ่งในการวัดระยับ โดย
ฉันทลักษณ์ก็สูง ขึ้นอยู่กับอารมณ์ของผู้พูด และคุณสมบัติที่เป็นประโยชน์ในรูปแบบ

แม้ไม่อัตโนมัติระบบจดจำสภาวะทางอารมณ์ที่โดดเด่น .
งานวิจัยนี้มีวัตถุประสงค์เพื่อพัฒนารูปแบบ และระบบการตรวจสอบสเปกตรัมเสียง

แนะนำคุณสมบัติใหม่ที่ใช้ในการวัดโดยระยับ การทดลองมี
ทำผ่าน switchboard-i ฐานข้อมูลเสียงพูด
ปาก . ฟิวชั่นของคุณลักษณะที่แตกต่างกันมีการปฏิบัติในระดับคะแนน โดยใช้คะแนนปกติ

รับน้ำหนักและวิธีการฟิวชั่น .
กระดาษนี้จัดดังนี้ ในส่วนถัดไป ,
ภาพรวมของลักษณะที่ใช้ในงานวิจัยนี้จะนำเสนอรายละเอียดของ
รวมทั้งโดยการวัดระยับ
การติดตั้งทดลอง และการทดลองแสดงใน
มาตรา 3 ในที่สุดข้อสรุปของการทดลองจะได้รับใน
มาตรา ๔

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.