function get_tfidf($word, $count) { if ($count < 1000) $count = 21000 - $count * 18; $tf = log($count); $tf = pow($tf, 5) * log(strlen($word)); $tf = log($tf); $idf = log(5000000000/$count); //if ($tf > 13) $idf *= 1.4; return array($...
function get_tfidf($word, $count) { if ($count < 1000) $count = 21000 - $count * 18; $tf = log($count); $tf = pow($tf, 5) * log(strlen($word)); $tf = log($tf); $idf = log(5000000000/$count); //if ($tf > 13) $idf *= 1.4; return array($...
...学[4, 5] 独立字: 的[6] 短句片断:知识[7, 8] Word: 从中/n (IDF = 5.25) Word: 学到/v (IDF = 5.28) Word: 大学/n (IDF = 4.23) Word: 的/uj (IDF = 0.00) Word: 知识/n (IDF = 4.57) "从中学到"的切分出现了问题,权重应该怎么调一下?
...e (cur) { printf("Word: %.*s/%s (IDF = %4.2f)\n", cur->len, text+cur->off, cur->attr, cur->idf); printf("length: %d\n", cur->len); cur = cur->next; } scws_free_result(res); ...
感谢H大回复,好像是你说的这个问题。 曾经我以为这个scws是国外高人开发的,最近才发现国内的牛人。作为苦逼的程序一员,你就是我心中的刘德华了。
目前没有方法,我看只有采集 http://www.ftphp.com/scws/demo/get_tfidf.php 这个地址。 希望可以共享一下
...ult(s)) { while (cur != NULL) { printf("Word: %.*s/%s (IDF = %4.2f)\n", cur->len, text+cur->off, cur->attr, cur->idf); cur = cur->next; } scws_free_result(res); } scws_free(s); } [/php] 结果如下: $ ./test Word: Hello/en (IDF = 4.02) ...
...|中|国|人|Array ( [0] => Array ( [word] => 我 [off] => 0 [len] => 3 [idf] => 0 [attr] => un ) [1] => Array ( [word] => 是 [off] => 3 [len] => 3 [idf] => 0 [attr] => un ) [2] => Array ( [word] => 一 [off] => 6 [len] => 3 [idf] => 0 [attr] => un ) [3] => Array ( [word] => 个 [off] => 9 [len] =>...