each set of 100 trials, there have

each set of 100 trials, there have been zero to two trials
with misses. The overall fraction of trials with misses
was 0.0038. We repeated the experiment with δ = 0.01,
i.e., so that the miss probability of any given frequent
set is at most 0.01. This experiment gave misses in
fraction 0.041 of all the trials. In both cases the fraction
of trials with misses was about four times δ.
The actual amount of reduction in the database
activity depends very much on the storage structures.
For instance, if the database has 10 million rows, a disk
block contains on average 100 rows, and the sample
size is 20,000, then the sampling phase could read up
to 20 % of the database. For the design and analysis of
sampling methods see, e.g, [OR89]. The related problem
of sampling for query estimation is considered in
more detail in [HS92]. A n alternative for randomly
drawing each row in separation is, of course, to draw
whole blocks of rows to the sample. Depending on how
randomly the rows have been assigned to the blocks,
this method can give very good or very bad results.

The reduction in database activity is achieved at the
cost of considering some attribute sets that. the level_wise
algorithm does not generate and check. Table 5
shows the average number of sets considered for data
set T10.14.D100K with different sarnple sizes, and
the number of candidate sets of the level-wise algorithm.
The largest absolute overhead occurs with
low thresholds, where the number of itemsets considered
has grown from 318,588 by 64,694 in the worst
case. This growth is not significant for the total execution
time since the itemsets are handled entirely in
main memory. The relative overhead is larger with
higher thresholds, but since the absolute overheads are
very small the effect is negligible. Table 5 indicates
that larger samples cause less overhead (with equally
good results), but that for sample sizes from 20,000 to
80,000 the difference in the overhead is not significant.
To obtain a better picture of the relation of δ and
the experimental number of trials with misses, we conducted
the following test. We took 100 samples (for
each frequency threshold and sample size) and determined
the lowered frequency threshold that would have
given misses in one out of the hundred trials. Figure 2
presents these results (as points), together with lines
showing the lowered thresholds with δ = 0.01 or 0.001,
i.e., the thresholds corresponding to miss probabilities
of 0.01 and 0.001 for a given frequent set. The
frequency thresholds that would give misses in fraction
0.01 of cases approximate surprisingly closely the
thresholds for δ = 0.01. Experiments with a larger
scale of sample sizes give comparable results. There
are two explanations for the similarity of the values.
One reason is that there are not necessarily many
potential misses, i.e., not many frequent sets with
frequency relatively close to the threshold. Another
reason that contributes to the similarity is that the sets
are not independent.
In the case of a possible failure, Algorithm 2 generates
iteratively all new candidates and makes another
pass over the database. In our experiments the number
of frequent sets missed-when any were missed-was
one or two for 6 = 0.001, and one to 16 for δ = 0.01.
The number of candidates checked on the second pass
was very small compared to the total number of itemsets
checked

The reduction in database activity is achieved at the
cost of considering some attribute sets that. the level_wise
algorithm does not generate and check. Table 5
shows the average number of sets considered for data
set T10.14.D100K with different sarnple sizes, and
the number of candidate sets of the level-wise algorithm.
The largest absolute overhead occurs with
low thresholds, where the number of itemsets considered
has grown from 318,588 by 64,694 in the worst
case. This growth is not significant for the total execution
time since the itemsets are handled entirely in
main memory. The relative overhead is larger with
higher thresholds, but since the absolute overheads are
very small the effect is negligible. Table 5 indicates
that larger samples cause less overhead (with equally
good results), but that for sample sizes from 20,000 to
80,000 the difference in the overhead is not significant.
To obtain a better picture of the relation of δ and
the experimental number of trials with misses, we conducted
the following test. We took 100 samples (for
each frequency threshold and sample size) and determined
the lowered frequency threshold that would have
given misses in one out of the hundred trials. Figure 2
presents these results (as points), together with lines
showing the lowered thresholds with δ = 0.01 or 0.001,
i.e., the thresholds corresponding to miss probabilities
of 0.01 and 0.001 for a given frequent set. The
frequency thresholds that would give misses in fraction
0.01 of cases approximate surprisingly closely the
thresholds for δ = 0.01. Experiments with a larger
scale of sample sizes give comparable results. There
are two explanations for the similarity of the values.
One reason is that there are not necessarily many
potential misses, i.e., not many frequent sets with
frequency relatively close to the threshold. Another
reason that contributes to the similarity is that the sets
are not independent.
In the case of a possible failure, Algorithm 2 generates
iteratively all new candidates and makes another
pass over the database. In our experiments the number
of frequent sets missed-when any were missed-was
one or two for 6 = 0.001, and one to 16 for δ = 0.01.
The number of candidates checked on the second pass
was very small compared to the total number of itemsets
checked

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

แต่ละชุดการทดลอง 100 มีศูนย์การทดลองที่สองมีพุ่ง เศษส่วนโดยรวมของการทดลองกับพุ่งมี 0.0038 เราซ้ำทดลองกับδ = 0.01เช่น ให้ความน่าเป็นมิสของให้บ่อย ๆตั้งอยู่ที่ 0.01 ส่วนใหญ่ การทดลองนี้ให้พุ่งในเศษ 0.041 ของการทดลองทั้งหมด ทั้งในกรณีที่เศษส่วนของการทดลองกับพุ่งเป็นδประมาณสี่ครั้งจำนวนลดลงในฐานข้อมูลจริงกิจกรรมมากขึ้นในโครงสร้างการจัดเก็บตัวอย่าง ถ้าฐานข้อมูลมี 10 ล้านแถว ดิสก์บล็อกประกอบด้วยแถวเฉลี่ย 100 ตัวอย่างมีขนาด 20000 แล้วขั้นตอนการสุ่มตัวอย่างสามารถอ่านค่า20% ของฐานข้อมูล สำหรับการออกแบบและวิเคราะห์ดูวิธีการสุ่มตัวอย่าง เช่น, [OR89] ปัญหาที่เกี่ยวข้องของการสุ่มตัวอย่างแบบสอบถาม การประเมินจะพิจารณาในรายละเอียดเพิ่มเติมใน [HS92] ทางเลือก n สำหรับการสุ่มวาดแต่ละแถวในแยกเป็น หลักสูตร การวาดบล็อกทั้งหมดของแถวตัวอย่าง ขึ้นอยู่กับวิธีแบบสุ่มแถวได้ถูกกำหนดให้กับบล็อกวิธีนี้สามารถให้ผลลัพธ์ที่ดี หรือเลวร้ายมากการลดกิจกรรมของฐานข้อมูลสามารถทำได้ที่นี้ต้นทุนพิจารณาแอททริบิวต์บางชุดที่ level_wiseอัลกอริทึมสร้าง และตรวจสอบไม่ ตาราง 5แสดงจำนวนค่าเฉลี่ยถือว่าข้อมูลชุดตั้ง T10.14.D100K มีขนาดแตกต่างกัน sarnple และจำนวนผู้สมัครชุดของอัลกอริทึม level-wiseค่าโสหุ้ยที่แน่นอนที่ใหญ่ที่สุดเกิดขึ้นขีดจำกัดต่ำสุด ซึ่งถือเป็นจำนวน itemsetsมีพัฒนาจาก 318,588 โดย 64,694 ในร้ายกรณี เจริญเติบโตนี้ไม่สำคัญสำหรับการดำเนินการทั้งหมดเวลาตั้งแต่ itemsets จะจัดการทั้งหมดในหน่วยความจำหลัก ค่าโสหุ้ยญาติขึ้นด้วยขีดจำกัดสูง แต่เนื่องจากค่าโสหุ้ยสัมบูรณ์ได้ขนาดเล็กมากผลคือระยะ ตาราง 5 แสดงว่า ตัวอย่างขนาดใหญ่ทำให้ค่าโสหุ้ยน้อยกว่า (มีเท่า ๆ กันผลดี), แต่ที่สำหรับขนาดตัวอย่างจาก 20000 ให้80000 ความแตกต่างของค่าโสหุ้ยไม่สำคัญเพื่อให้ได้ภาพดีขึ้นของความสัมพันธ์ของδ และคิดถึงจำนวนทดลองกับทดลอง เราดำเนินการการทดสอบต่อไปนี้ เราเอาตัวอย่าง 100 (สำหรับแต่ละความถี่ตัวอย่างและขีดจำกัดขนาด) และกำหนดขีดจำกัดความถี่ต่ำลงซึ่งจะมีพุ่งให้หนึ่งจากทดลองร้อย รูปที่ 2แสดงผลลัพธ์เหล่านี้ (เป็นจุด), พร้อมกับบรรทัดแสดงขีดจำกัดต่ำลงกับδ = 0.01 หรือ 0.001เช่น ขีดจำกัดที่จะพลาดกิจกรรม0.01 และ 0.001 สำหรับชุดมักกำหนด ที่ขีดจำกัดความถี่ที่จะให้พุ่งเศษประมาณ 0.01 กรณีอย่างใกล้ชิดน่าแปลกใจขีดจำกัดสำหรับδ = 0.01 ทดลอง มีขนาดใหญ่ขนาดของกลุ่มตัวอย่างขนาดให้ผลเทียบเท่า มีมีคำอธิบายที่สองสำหรับเฉพาะค่าเหตุผลหนึ่งคือ ว่า ไม่มีจำเป็นมากพุ่งไป เช่น ไม่ได้เป็นชุดด้วยความถี่แหขีดจำกัด อีกเหตุผลที่สนับสนุนความคล้ายกันคือชุดจะไม่อิสระในกรณีของความล้มเหลวเป็นไปได้ สร้างอัลกอริทึม 2ผู้สมัครใหม่ทั้งหมดซ้ำ ๆ และทำให้อีกผ่านฐานข้อมูล ในการทดลองของเราหมายเลขเมื่อพลาดบ่อยชุด ใด ๆ ที่พลาด-ถูกหนึ่งหรือสองการ 6 = 0.001 หนึ่ง 16 สำหรับδ = 0.01จำนวนผู้สมัครตรวจสอบในรอบที่สองมีขนาดเล็กมากเมื่อเทียบกับจำนวน itemsetsการตรวจสอบ

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

ชุดละ 100 ครั้ง , มีศูนย์สองการทดลอง
ด้วยคิดถึง . สัดส่วนโดยรวมของการทดลองกับคิดถึง
คือ 0.0038 . เราทำซ้ำการทดลองกับδ = 0.01
คือเพื่อให้คุณมีโอกาสใด ๆ ให้บ่อยที่สุด
ชุดที่ 0.01 การทดลองนี้ทำให้พลาด
ส่วน 0.041 ของการทดลองทั้งหมด ในทั้งสองกรณีเศษ
ของการทดลองกับขาดหายไปประมาณ 4 ครั้งδ .
จำนวนเงินที่แท้จริงของการลดลงในฐานข้อมูล
กิจกรรมขึ้นอยู่กับการจัดเก็บโครงสร้าง .
ตัวอย่าง ถ้าฐานข้อมูลมี 10 ล้านแถว ดิสก์
บล็อกมีเฉลี่ย 100 แถว และตัวอย่าง
ขนาด 20 , 000 แล้วตัวอย่างเฟสสามารถอ่าน
20% ของฐานข้อมูล สำหรับการออกแบบและการวิเคราะห์
วิธีดู ตัวอย่าง เช่น , [ or89 ] ที่เกี่ยวข้องกับปัญหา
ตัวอย่างการประเมินแบบสอบถามจะพิจารณารายละเอียดเพิ่มเติมใน hs92
[ ] n ทางเลือกสำหรับสุ่ม
รูปวาดแต่ละแถวแยกเป็น แน่นอน การวาด
บล็อกทั้งหมดของแถวกับตัวอย่าง ขึ้นอยู่กับว่า
สุ่มแถวได้รับมอบหมายให้บล็อก
วิธีนี้จะให้ผลที่ดีมาก หรือเลวมาก

ลดกิจกรรมฐานข้อมูลความ
ต้นทุนของการพิจารณาบางแอตทริบิวต์ชุดนั้น การ level_wise
ขั้นตอนวิธีที่ไม่สร้างและตรวจสอบ ตารางที่ 5 แสดงจำนวนชุด

ถือว่าข้อมูลชุด sarnple t10.14.d100k ที่มีขนาดแตกต่างกันและ
จำนวนผู้สมัครชุดระดับปัญญา อัลกอริทึม ที่ใหญ่ที่สุดที่เกิดขึ้นกับค่าใช้จ่ายแน่นอน

ต่ำเกณฑ์ ซึ่งจำนวน itemsets ถือว่า
ได้เติบโตขึ้นจาก 318588 โดย 64 ,เพราะในกรณีเลวร้ายที่สุด

การเติบโตนี้ไม่ใช่ที่สำคัญสำหรับการรวมเวลาตั้งแต่ itemsets จัดการ

ทั้งหมดในหน่วยความจำหลัก ค่าใช้จ่ายสัมพัทธ์มีขนาดใหญ่กับ
ธรณีประตูสูง แต่เนื่องจากค่าโสหุ้ยสัมบูรณ์อยู่
ขนาดเล็กมากผลเป็นเล็กน้อย ตารางที่ 5 พบว่า ตัวอย่างที่มีขนาดใหญ่เพราะค่าใช้จ่ายน้อยกว่า
( อย่างเท่าเทียมกัน
ผลลัพธ์ที่ดี ) แต่สำหรับขนาดตัวอย่างจาก 20
, ความแตกต่างในค่าใช้จ่ายไม่สําคัญ
เพื่อให้ได้ภาพที่ดีขึ้นของความสัมพันธ์ของδและ
หมายเลขทดลองทดสอบกับคิดถึงเราดำเนินการ
ทดสอบต่อไปนี้ เราเอา 100 ตัวอย่าง (
แต่ละความถี่ขีดเริ่ม และขนาดตัวอย่าง ) และมุ่งมั่น
ลดความถี่ขีดเริ่มที่จะมี
ให้พลาด 1 ในร้อยการทดลอง รูปที่ 2
แสดงผล ( คะแนน ) ร่วมกับสาย
แสดงลดลง ซึ่งมีδ = 0.01 หรือ 0.001 ,
เช่น ซึ่งสอดคล้องกับคิดถึงความน่าจะเป็น
0.01 และ 0.001 ให้บ่อย ๆ การตั้งค่า
ความถี่ ซึ่งจะทำให้พลาดในส่วน
0.01 กรณีโดยประมาณ จู่ ๆอย่างใกล้ชิด
ธรณีประตูเพื่อδ = 0.01 การทดลองที่มีขนาดใหญ่
ขนาดของขนาดตัวอย่างให้ผลใกล้เคียงกัน มี
2 คำอธิบายสำหรับความคล้ายคลึงกันของค่า
เหตุผลหนึ่งคือ มีไม่หลาย
ศักยภาพคิดถึง คือ ไม่พบบ่อย ชุดกับ
ความถี่ค่อนข้างใกล้กับธรณีประตู อีกเหตุผลที่ก่อให้เกิด
ความเหมือนคือชุด

ไม่อิสระ ในกรณีของความล้มเหลวที่เป็นไปได้วิธีที่ 2 สร้าง
ซ้ำใหม่ทั้งหมด ผู้สมัคร และทำให้ผ่านอีก
ผ่านฐานข้อมูล ในการทดลองของเราหมายเลข
ชุดบ่อยพลาดเมื่อพลาดถูก
หนึ่งหรือสอง 6 = 0.001 และหนึ่งถึง 16 δ = 0.01
หมายเลขของผู้สมัครได้ตรวจสอบ
ผ่านสองมีขนาดเล็กมากเมื่อเทียบกับจำนวน itemsets
ตรวจสอบ

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.