1.Create a initial set of keywords relevant to security defects
2. Mine the code review repositories and populate database.
3. Search the database using initial set of keywords and build a CSV file (Corpus). Each entry in the csv file is a code review comments containing at least one of the predetermined keywords.
4.Because,many of the comments and texts contains code snippet, apply identifier splitting rules on the corpus. (i.e. isBufferFull becomes "is Buffer Full" or read_string becomes "readstring").
5. Clean the corpus.Remove whitespace,punctuation,and numbers. Convert all words to lowercase. Create list of tokens for each document(i.e. rowincsv) in the Corpus.
6. Apply porter stemming algorithm to find the stem of each of the tokens. (i.e. buffer, buffered, buffering all becomes buffer).
7. Create a Document Term matrix from the corpus.
8. Determine the words those co-occurred frequently with each of our predetermined keywords.
9. Manually inspect all the frequently co-occurring words, to determine which keywords should be added to the predetermined keywords list. The last row of Table 5 lists the keywords,we added after the test-mining.
1.Create a initial set of keywords relevant to security defects2. Mine the code review repositories and populate database. 3. Search the database using initial set of keywords and build a CSV file (Corpus). Each entry in the csv file is a code review comments containing at least one of the predetermined keywords. 4.Because,many of the comments and texts contains code snippet, apply identifier splitting rules on the corpus. (i.e. isBufferFull becomes "is Buffer Full" or read_string becomes "readstring"). 5. Clean the corpus.Remove whitespace,punctuation,and numbers. Convert all words to lowercase. Create list of tokens for each document(i.e. rowincsv) in the Corpus. 6. Apply porter stemming algorithm to find the stem of each of the tokens. (i.e. buffer, buffered, buffering all becomes buffer). 7. Create a Document Term matrix from the corpus. 8. Determine the words those co-occurred frequently with each of our predetermined keywords. 9. Manually inspect all the frequently co-occurring words, to determine which keywords should be added to the predetermined keywords list. The last row of Table 5 lists the keywords,we added after the test-mining.
การแปล กรุณารอสักครู่..
