bigram是什么？

马涌河畔

昨天看gensim的LDA Model文档的时候，看到这样一段：

We find bigrams in the documents. Bigrams are sets of two adjacent words. Using bigrams we can get phrases like “machine_learning” in our output (spaces are replaced with underscores); without bigrams we would only get “machine” and “learning”.

Note that in the code below, we find bigrams and then add them to the original data, because we would like to keep the words “machine” and “learning” as well as the bigram “machine_learning”.

bigram是指两个词组成的词组吗

内容分析应用 · 发表于 2021-6-21 09:02:20

unigram 一元分词，把句子分成一个一个的词
bigram 二元分词，把句子从头到尾每两个字组成一个词语
trigram 三元分词，把句子从头到尾每三个字组成一个词语.
n-gram models就是n元语言模型

bigram是什么？

共 1 个关于本帖的回复最后回复于 2021-6-21 09:02

推荐板块

精彩推荐

热门话题

热门用户

	B Color Image Link Quote Code Smilies 高级模式您需要登录后才可以回帖登录 \| 立即注册回帖并转播回帖后跳转到最后一页

bigram是什么？

共 1 个关于本帖的回复 最后回复于 2021-6-21 09:02

推荐板块

精彩推荐

热门话题

热门用户

共 1 个关于本帖的回复最后回复于 2021-6-21 09:02