Skip to main content

Posts

Showing posts from March, 2013

OpenNLP thread safety issue

From the OpenNLP documents. e.g.  Tokenizer "A tokenizer instance is not thread safe. For each thread one tokenizer must be instantiated which can share one  TokenizerModel  instance to safe memory."  Actually, the whole point is (1) Use a single instance (singleton) of Model file (e.g., TokenModel or POSModel    POSModel pModel = new POSModel(new FileInputStream(posModelFile));) , which is shared by multiple threads (2) In each thread, you can create multiple tokenizers or POStaggers ((e.g.,  POSTaggerME POSTagger = new POSTaggerME(pModel)). Their creation is light weight.  

Japanese NLP resource

http://nlp.cs.nyu.edu/ene/ Japanese opinion lexicon for sentiment analysis http://www.cl.ecei.tohoku.ac.jp/index.php?%E5%85%AC%E9%96%8B%E8%B3%87%E6%BA%90%2F%E6%97%A5%E6%9C%AC%E8%AA%9E%E8%A9%95%E4%BE%A1%E6%A5%B5%E6%80%A7%E8%BE%9E%E6%9B%B8 Japanese NLP tools http://gurenlinguistics.blogspot.com/2011/11/list-of-japanese-nlp-tools.html

NLP tools collection

Original Post: http://www.phontron.com/nlptools.php This is a list of NLP tools for various purposes. Dependency Parser CaboCha : A tool for Japanese dependency structure analysis based on cascaded chunking. KNP : A Japanese dependency parser that also includes some form of predicate-argument analysis. MaltParser : A parser based on the shift-reduce method. MSTParser : A tool for dependency parsing based on maximum spanning trees. Finite State Models Kyfd : A decoder for text-processing systems build using weighted finite state transducers. OpenFST : A library implementing many operations over weighted finite state transducers (WFSTs) to allow for easy building of finite-state models. General NLP Libraries NLTK : A general library for NLP written in Python. OpenNLP : A library written in Java that implements many different NLP tools. Stanford CoreNLP : A library including many of the NLP tools developed at Stanford. Language Modeling IRSTLM : A...

五种开源协议的比较(BSD,Apache,GPL,LGPL,MIT)

http://www.awflasher.com/blog/archives/939 当Adobe、Microsoft、Sun等一系列巨头开始表现出对”开源”的青睐时,”开源”的时代即将到来! 最初来自:sinoprise.com/read.php?tid-662-page-e-fpage-1.html(遗憾的是这个链接已经打不开了),我基本未改动,只是进行了一些排版和整理。 参考文献: http://www.fsf.org/licensing/licenses/ 现今存在的开源协议很多,而经过Open Source Initiative组织通过批准的开源协议目前有58种( http://www.opensource.org/licenses/alphabetical )。我们在常见的开源协议如BSD, GPL, LGPL,MIT等都是OSI批准的协议。如果要开源自己的代码,最好也是选择这些被批准的开源协议。 这里我们来看四种最常用的开源协议及它们的适用范围,供那些准备开源或者使用开源产品的开发人员/厂家参考。 BSD开源协议( original BSD license 、 FreeBSD license 、 Original BSD license ) BSD开源协议是一个给于使用者很大自由的协议。基本上使用者可以”为所欲为”,可以自由的使用,修改源代码,也可以将修改后的代码作为开源或者专有软件再发布。 但”为所欲为”的前提当你发布使用了BSD协议的代码,或则以BSD协议代码为基础做二次开发自己的产品时,需要满足三个条件: 如果再发布的产品中包含源代码,则在源代码中必须带有原来代码中的BSD协议。 如果再发布的只是二进制类库/软件,则需要在类库/软件的文档和版权声明中包含原来代码中的BSD协议。 不可以用开源代码的作者/机构名字和原来产品的名字做市场推广。 BSD 代码鼓励代码共享,但需要尊重代码作者的著作权。BSD由于允许使用者修改和重新发布代码,也允许使用或在BSD代码上开发商业软件发布和销售,因此是对商业集成很友好的协议。而很多的公司企业在选用开源产品的时候都首选BSD协议,因为可以完全控制这些第三方的代码,在必要的时候可以修改或者二次开发。 Apache Licence 2.0( Apache Licen...

Spring tutorials

Handling form input http://www.tutorialspoint.com/spring/spring_mvc_form_handling_example.htm General tutorials http://krams915.blogspot.com/p/tutorials.html http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html http://viralpatel.net/blogs/tutorial-spring-3-mvc-introduction-spring-mvc-framework/