Adn622+kecanduan+genjotan+anaku+sendiri+miu+shiramine+indo18+verified

It implements a “Keyword‑Lookup” feature that scans a data source (database rows, log files, scraped pages, etc.) for the exact set of terms you listed:

adn622
kecanduan
genjotan
anaku
sendiri
miu
shiramine
indo18
verified

The goal is to detect any record that contains one or more of these tokens, flag it, and (optionally) return the matched context.


+-------------------+       +-------------------+       +-------------------+
|   Input Source    |  -->  |   Index/Storage   |  -->  |   Search Engine   |
| (DB, files, API) |       |  (Elasticsearch, |       |  (query builder   |
|                   |       |   SQLite, …)     |       |   + ranking)      |
+-------------------+       +-------------------+       +-------------------+
                                    |
                                    v
                         +-------------------+
                         |   Result Formatter|
                         +-------------------+
                                    |
                                    v
                         +-------------------+
                         |   API / UI Layer  |
                         +-------------------+

Store the list in a configuration file (YAML/JSON) or a database table so you can add/remove terms without code changes.

# keywords.yaml
keywords:
  - adn622
  - kecanduan
  - genjotan
  - anaku
  - sendiri
  - miu
  - shiramine
  - indo18
  - verified

Load it at start‑up:

import yaml
with open('keywords.yaml') as f:
    KEYWORDS = yaml.safe_load(f)['keywords']

| Concern | Mitigation | |---------|------------| | Sensitive data exposure | Store only what you need for matching (e.g., hash or redact personal identifiers before indexing). | | Performance attacks (very large payloads) | Impose request size limits, rate‑limit the endpoint, and/or process data in streaming mode. | | False positives | Use word boundaries (\b) in regex, or the match_phrase query in ES to avoid matching substrings inside unrelated words. | | Logging | Avoid logging raw user‑submitted text unless you have a clear retention policy. |


# -------------------------------------------------
# keyword_lookup.py
# -------------------------------------------------
import yaml, re, json, sys
from pathlib import Path
# 1️⃣ Load keywords
with Path('keywords.yaml').open() as f:
    KEYWORDS = yaml.safe_load(f)['keywords']
# 2️⃣ Compile regex
PATTERN = re.compile(r'\b(' + '|'.join(map(re.escape, KEYWORDS)) + r')\b', re.I)
def find_matches(text: str):
    """Return a set of matched keywords (case‑insensitive)."""
    return m.group(0).lower() for m in PATTERN.finditer(text)
# 3️⃣ Simple CLI demo
if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("Usage: python keyword_lookup.py \"<text to scan>\"")
        sys.exit(1)
input_text = sys.argv[1]
    matches = find_matches(input_text)
    print(json.dumps(
        "input": input_text,
        "matches": sorted(matches),
        "has_match": bool(matches)
    , ensure_ascii=False, indent=2))

Run:

$ python keyword_lookup.py "adn622 and miu are verified"
"input": "adn622 and miu are verified",
  "matches": [
    "adn622",
    "miu",
    "verified"
  ],
  "has_match": true

Sample pytest snippet for the regex approach: It implements a “Keyword‑Lookup” feature that scans a

def test_regex_matches():
    txt = "The user adn622 posted a verified video about miu."
    assert find_matches_regex(txt) == "adn622", "verified", "miu"

Elasticsearch Example

PUT keywords_demo
"mappings": 
    "properties": 
      "content":  "type": "text", "analyzer": "standard"
POST _bulk
 "index":  "_index": "keywords_demo", "_id": "1"  
 "content": "adn622 posted a selfie..." 
 "index":  "_index": "keywords_demo", "_id": "2"  
 "content": "This is a generic post without matches." 
GET keywords_demo/_search
{
  "query": 
    "bool": 
      "should": [
         "match_phrase":  "content": "adn622"  ,
         "match_phrase":  "content": "kecanduan"  ,
         "match_phrase":  "content": "genjotan"  ,
         "match_phrase":  "content": "anaku"  ,
         "match_phrase":  "content": "sendiri"  ,
         "match_phrase":  "content": "miu"  ,
         "match_phrase":  "content": "shiramine"  ,
         "match_phrase":  "content": "indo18"  ,
         "match_phrase":  "content": "verified"  
      ],
      "minimum_should_match": 1
,
  "highlight": {
    "fields": { "content": {} }
  }
}

The response contains each hit with a highlight snippet showing the matched term(s).

Pros: Near‑real‑time search across millions of documents, built‑in scoring, pagination, and powerful analytics.
Cons: Requires an external service (Elasticsearch, OpenSearch, Solr, etc.) and some operational overhead. The goal is to detect any record that


Below is a minimal Flask‑style endpoint (Python) that returns JSON results.

from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/search', methods=['GET'])
def search():
    # Expected query param: ?q=some+text
    q = request.args.get('q', '')
    matches = find_matches_regex(q)          # or find_matches(q, KEYWORDS)
    return jsonify(
        "query": q,
        "matched_keywords": list(matches),
        "has_match": bool(matches)
    )
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Response example


  "query": "adn622 and miu are verified",
  "matched_keywords": ["adn622", "miu", "verified"],
  "has_match": true

You can easily extend this to: