by Diptin Dahal
Posted on December 9, 2019 at 3:00 PM
Search Engine for the Ricetta App
The “.json” data file (DataSet for receipes) was first loaded and all the stop words, non- numeric and non- alphabet were removed from them. Also, all the words were individually lemmatized. After that TF and IDF were calculated for every word.
TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document).
IDF(t) = loge(Total number of documents / Number of documents with term t in it).
And TF x IDF values were calculated and the tfidf values for the data was saved as a pickled file. The pickled file is used in the webapp where the user provided search terms are evaluated individually with the calculated tfidf and a final score is calculated for the available data. Then the data value with the highest final score (tfidf value) is displayed first. And the patter is followed by displaying the second highest score and so on.