It downloads text data and makes combination of following document types and languages :
html - "html"
pdf - "pdf"
odt - "vnd.oasis.opendocument.text"
docx - "vnd.openxmlformats-officedocument.wordprocessingml.document"
doc - "msword"
xlsx - "vnd.openxmlformats-officedocument.spreadsheetml.sheet"
xls - "vnd.ms-excel"
ppt - "vnd.ms-powerpoint"
(bg, es, cs, da, de, et, el, en, fr, it, lv, lt, hu, mt, nl, pl, pt, ro, sk, sl, fi, sv)
You just need to call one of DocumentProvider's API methods :
DocumentProvider.getDocByTypeAndLang(type, lang);
to get object(s) representing a document :
long id;
long size;
long checksum;
String type;
String sample;
File sampleFile;
String url;
String state;
File file;
MediaType mediaType;
int wordCount;
String content;
List<String> words;
List<String> sampleWords;
int sampleWordCount;
No comments:
Post a Comment