Stories from the Data — Vidio Plays EDA¶
Dataset: 1% sample of play events (~107K rows, Feb 1–16 2020)
Goal: Understand user viewing behavior, content patterns, and determine which platform earns the keenest viewers.
0. Setup & Data Loading¶
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Slide theme colors
VIDIO_PINK = '#E6175C'
VIDIO_LIGHT = '#FF3D7F'
VIDIO_DARK = '#B30E46'
VIDIO_BLUSH = '#FFB8CE'
VIDIO_BG = '#FDF0F0'
# Build a custom palette from the slide gradient
vidio_palette = [VIDIO_PINK, VIDIO_LIGHT, VIDIO_DARK, VIDIO_BLUSH, '#D4145A', '#FF6699', '#8C0B3A']
vidio_cmap = mcolors.LinearSegmentedColormap.from_list('vidio', [VIDIO_BLUSH, VIDIO_PINK, VIDIO_DARK])
sns.set_theme(style='whitegrid')
sns.set_palette(vidio_palette)
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['figure.dpi'] = 100
plt.rcParams['figure.facecolor'] = VIDIO_BG
plt.rcParams['axes.facecolor'] = '#FFFFFF'
plt.rcParams['axes.edgecolor'] = '#DDDDDD'
plt.rcParams['text.color'] = '#2A1215'
plt.rcParams['axes.labelcolor'] = '#2A1215'
plt.rcParams['xtick.color'] = '#555555'
plt.rcParams['ytick.color'] = '#555555'
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 140)
df = pd.read_csv('data/plays_1.csv.gz', on_bad_lines='skip', low_memory=False)
# Parse datetime columns
df['play_time'] = pd.to_datetime(df['play_time'], errors='coerce')
df['end_time'] = pd.to_datetime(df['end_time'], errors='coerce')
print(f"Shape: {df.shape[0]:,} rows x {df.shape[1]} columns")
print(f"Date range: {df['play_time'].min()} → {df['play_time'].max()}")
df.head(3)
Shape: 106,811 rows x 41 columns Date range: 2020-02-01 09:26:40+00:00 → 2020-02-16 16:59:36+00:00
| hash_content_id | hash_play_id | hash_visit_id | hash_watcher_id | hash_film_id | hash_event_id | is_login | playback_location | platform | play_time | end_time | referrer | average_bitrate | bitrate_range | total_bytes | buffer_duration | referrer_group | completed | utm_source | utm_medium | utm_campaign | player_name | has_ad | flash_version | os_name | os_version | browser_name | browser_version | app_name | autoplay | is_premium | app_version | city | play_duration | content_type | stream_type | title | category_name | film_title | season_name | genre_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0096dafb1049ee942e8e7cbad5abf4a46dc92e3995caac... | 0162f124f5bd61592d9bca6aaa3b1b6097a00b62ef33c9... | 27dfffe2bb74f8767caeb64ac4c92e9eaf4b11a28ef7e1... | e7c2138fd9d4047356066a01de46fc22ee9955a6fd6679... | NaN | c81bd64a212d3b5cb227499a6b4a3dc607748ae5133e48... | False | embed | web-mobile | 2020-02-02 10:21:20+00:00 | 2020-02-02 10:21:20+00:00 | https://m.liputan6.com/bola/read/4169085/marc-... | 300000.0 | 200-500 | 0 | 0.0 | liputan6 | False | NaN | NaN | NaN | videojs | True | 0,0,0 | Android | 9 | Chrome Mobile WebView | 62.0.3202 | vidio | False | False | NaN | NaN | 0 | vod | NaN | Kiprah Eks Pemain Juventus yang Dikaitkan deng... | Sports | NaN | NaN | NaN |
| 1 | 01ddda2f8667719625bd0afa58537fd95a166c93233b8b... | 5f6c722782c919a9cb6254e94086462a6c34794dc741b6... | 90b2b454e70a2c13cf9e8ca987d7c4f914c7d819532c5d... | d265ea2b8df584c51de3b2c6aebb6a0415384a18d1a148... | NaN | 90a514e451e400c4c32ac42815920a66c500c1c2ea347d... | False | embed | web-mobile | 2020-02-02 12:17:33+00:00 | 2020-02-02 12:17:53+00:00 | https://m.bola.net/italia/lautaro-martinez-abs... | 600.0 | 0-200 | 750 | 0.0 | bolanet | False | NaN | NaN | NaN | videojs | True | 0,0,0 | Android | 9 | Chrome Mobile | 79.0.3945 | vidio | False | False | NaN | NaN | 19 | vod | NaN | Pindah ke Inter Milan, Christian Eriksen Jadi ... | Sports | NaN | NaN | NaN |
| 2 | 01ddda2f8667719625bd0afa58537fd95a166c93233b8b... | 66c32aa6aa6a63f7d85d1bd1b8b6b295477c88794888d8... | f79cab25855841a83e6aa01eccbf3d167491a868977b77... | d2585a4b468b7cdbebc053ffc5b49cebd58d6e738bccdd... | NaN | c278001db02102ece22d428b9235332856cba4754b63a0... | False | embed | web-mobile | 2020-02-02 01:26:42+00:00 | 2020-02-02 01:27:07+00:00 | https://m.bola.net/italia/lautaro-martinez-abs... | 300000.0 | 200-500 | 900000 | 0.0 | bolanet | False | NaN | NaN | NaN | videojs | True | 0,0,0 | Android | 9 | Samsung Internet | 10.2 | vidio | False | False | NaN | NaN | 24 | vod | NaN | Pindah ke Inter Milan, Christian Eriksen Jadi ... | Sports | NaN | NaN | NaN |
# Missing values overview
missing = df.isnull().sum()
missing = missing[missing > 0].sort_values(ascending=False)
missing_pct = (missing / len(df) * 100).round(1)
pd.DataFrame({'missing_count': missing, 'missing_pct': missing_pct})
| missing_count | missing_pct | |
|---|---|---|
| city | 106811 | 100.0 |
| utm_campaign | 106773 | 100.0 |
| utm_source | 106724 | 99.9 |
| utm_medium | 106724 | 99.9 |
| genre_name | 101586 | 95.1 |
| film_title | 100995 | 94.6 |
| season_name | 100993 | 94.6 |
| hash_film_id | 100993 | 94.6 |
| app_version | 72129 | 67.5 |
| stream_type | 67874 | 63.5 |
| autoplay | 38960 | 36.5 |
| category_name | 38937 | 36.5 |
| completed | 38937 | 36.5 |
| browser_version | 36038 | 33.7 |
| flash_version | 34850 | 32.6 |
| os_name | 33725 | 31.6 |
| browser_name | 33666 | 31.5 |
| referrer | 5761 | 5.4 |
| average_bitrate | 4946 | 4.6 |
| player_name | 3267 | 3.1 |
| os_version | 405 | 0.4 |
1. Data Cleaning & Feature Engineering¶
# Drop columns that are ~100% null
drop_cols = ['city', 'utm_source', 'utm_medium', 'utm_campaign']
df = df.drop(columns=drop_cols)
# Fix boolean columns that were read as object due to bad lines
for col in ['completed', 'is_login', 'has_ad', 'autoplay', 'is_premium']:
if df[col].dtype == 'object':
df[col] = df[col].map({'True': True, 'False': False, True: True, False: False})
# Feature engineering
df['hour'] = df['play_time'].dt.hour
df['day_of_week'] = df['play_time'].dt.day_name()
df['date'] = df['play_time'].dt.date
df['is_engaged'] = df['play_duration'] >= 60 # watched at least 1 minute
print(f"Shape after cleaning: {df.shape}")
print(f"\nNew columns: hour, day_of_week, date, is_engaged")
print(f"Engaged plays (>=60s): {df['is_engaged'].sum():,} ({df['is_engaged'].mean()*100:.1f}%)")
Shape after cleaning: (106811, 41) New columns: hour, day_of_week, date, is_engaged Engaged plays (>=60s): 44,933 (42.1%)
2. Overall Traffic Patterns¶
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Daily play volume
daily = df.groupby('date').size()
axes[0].plot(daily.index, daily.values, marker='o', linewidth=2, color=VIDIO_PINK, markerfacecolor=VIDIO_DARK)
axes[0].set_title('Daily Play Volume')
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Number of Plays')
axes[0].tick_params(axis='x', rotation=45)
# Hourly distribution
hourly = df.groupby('hour').size()
axes[1].bar(hourly.index, hourly.values, color=VIDIO_LIGHT, edgecolor=VIDIO_PINK, linewidth=0.5)
axes[1].set_title('Plays by Hour of Day (UTC)')
axes[1].set_xlabel('Hour')
axes[1].set_ylabel('Number of Plays')
# Day of week
dow_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow = df['day_of_week'].value_counts().reindex(dow_order)
axes[2].bar(range(7), dow.values, color=VIDIO_PINK, edgecolor=VIDIO_DARK, linewidth=0.5)
axes[2].set_xticks(range(7))
axes[2].set_xticklabels(['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'])
axes[2].set_title('Plays by Day of Week')
axes[2].set_ylabel('Number of Plays')
plt.tight_layout()
plt.show()
3. Content Analysis¶
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Content type distribution
ct = df['content_type'].value_counts()
axes[0].bar(ct.index, ct.values, color=[VIDIO_PINK, VIDIO_LIGHT, VIDIO_BLUSH])
for i, (val, cnt) in enumerate(ct.items()):
axes[0].text(i, cnt + 500, f'{cnt/len(df)*100:.1f}%', ha='center', fontweight='bold', color=VIDIO_DARK)
axes[0].set_title('Content Type Distribution')
axes[0].set_ylabel('Number of Plays')
# Top 10 titles
top_titles = df['title'].value_counts().head(10)
axes[1].barh(top_titles.index[::-1], top_titles.values[::-1], color=VIDIO_PINK)
axes[1].set_title('Top 10 Titles by Play Count')
axes[1].set_xlabel('Number of Plays')
# Top categories
cat = df['category_name'].value_counts().head(10)
axes[2].barh(cat.index[::-1], cat.values[::-1], color=VIDIO_LIGHT)
axes[2].set_title('Top 10 Categories')
axes[2].set_xlabel('Number of Plays')
plt.tight_layout()
plt.show()
# Premium vs Free
premium = df['is_premium'].value_counts()
print("Premium vs Free content:")
print(f" Free: {premium[False]:>7,} plays ({premium[False]/len(df)*100:.1f}%)")
print(f" Premium: {premium[True]:>7,} plays ({premium[True]/len(df)*100:.1f}%)")
Premium vs Free content: Free: 105,886 plays (99.1%) Premium: 925 plays (0.9%)
4. User & Session Analysis¶
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Login vs Anonymous
login = df['is_login'].value_counts()
axes[0].bar(['Anonymous', 'Logged In'], [login[False], login[True]],
color=[VIDIO_BLUSH, VIDIO_PINK])
for i, cnt in enumerate([login[False], login[True]]):
axes[0].text(i, cnt + 500, f'{cnt/len(df)*100:.1f}%', ha='center', fontweight='bold', color=VIDIO_DARK)
axes[0].set_title('Login vs Anonymous Plays')
axes[0].set_ylabel('Number of Plays')
# Playback location
loc = df['playback_location'].value_counts()
axes[1].bar(loc.index, loc.values, color=[VIDIO_PINK, VIDIO_LIGHT])
for i, (val, cnt) in enumerate(loc.items()):
axes[1].text(i, cnt + 500, f'{cnt/len(df)*100:.1f}%', ha='center', fontweight='bold', color=VIDIO_DARK)
axes[1].set_title('Playback Location')
axes[1].set_ylabel('Number of Plays')
# Top referrer groups
ref = df['referrer_group'].value_counts().head(8)
axes[2].barh(ref.index[::-1], ref.values[::-1], color=VIDIO_PINK)
axes[2].set_title('Top Referrer Groups')
axes[2].set_xlabel('Number of Plays')
plt.tight_layout()
plt.show()
5. Platform Deep-Dive¶
Central question: Which platform do users keenly use to watch Vidio?
We define "keen" viewership not just by volume, but by engagement quality — how long users watch, whether they complete content, and how committed they are (login, repeat usage).
# Platform overview — play volume
platform_counts = df['platform'].value_counts()
fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.bar(platform_counts.index, platform_counts.values, color=vidio_palette[:len(platform_counts)])
for bar, cnt in zip(bars, platform_counts.values):
ax.text(bar.get_x() + bar.get_width()/2, cnt + 500,
f'{cnt:,}\n({cnt/len(df)*100:.1f}%)', ha='center', fontweight='bold', color=VIDIO_DARK)
ax.set_title('Play Volume by Platform')
ax.set_ylabel('Number of Plays')
plt.tight_layout()
plt.show()
# Build comprehensive platform metrics table
platform_metrics = df.groupby('platform').agg(
total_plays=('hash_play_id', 'count'),
unique_watchers=('hash_watcher_id', 'nunique'),
median_duration=('play_duration', 'median'),
mean_duration=('play_duration', 'mean'),
engagement_rate=('is_engaged', 'mean'),
login_rate=('is_login', 'mean'),
avg_bitrate=('average_bitrate', 'mean'),
has_ad_rate=('has_ad', 'mean'),
)
# Ensure numeric types and round
platform_metrics = platform_metrics.apply(pd.to_numeric, errors='coerce').round(3)
# Plays per unique watcher (repeat usage)
platform_metrics['plays_per_watcher'] = (
platform_metrics['total_plays'] / platform_metrics['unique_watchers']
).round(2)
# Completion rate (VOD only — livestreaming has no 'completed')
vod = df[df['content_type'] == 'vod']
completion = vod.groupby('platform')['completed'].apply(lambda x: x.astype(float).mean()).round(3)
platform_metrics['completion_rate_vod'] = completion
platform_metrics
| total_plays | unique_watchers | median_duration | mean_duration | engagement_rate | login_rate | avg_bitrate | has_ad_rate | plays_per_watcher | completion_rate_vod | |
|---|---|---|---|---|---|---|---|---|---|---|
| platform | ||||||||||
| app-android | 25955 | 24083 | 120.0 | 884.198 | 0.744 | 0.763 | 300000.000 | 0.650 | 1.08 | 0.323 |
| app-ios | 1781 | 1673 | 60.0 | 566.040 | 0.526 | 0.755 | 122684.309 | 0.536 | 1.06 | 0.271 |
| tv-android | 5930 | 4989 | 45.0 | 1129.359 | 0.471 | 0.253 | 300000.000 | 0.000 | 1.19 | 0.211 |
| tv-tizen | 1031 | 922 | 60.0 | 903.986 | 0.506 | 0.638 | 300000.000 | 0.000 | 1.12 | 0.220 |
| tv-webos | 153 | 153 | 0.0 | 0.163 | 0.000 | 0.007 | 300000.000 | 0.000 | 1.00 | 0.000 |
| web-desktop | 10233 | 9939 | 32.0 | 594.904 | 0.415 | 0.055 | 48141.791 | 0.697 | 1.03 | 0.256 |
| web-mobile | 61728 | 61248 | 26.0 | 125.819 | 0.277 | 0.011 | 174329.385 | 0.926 | 1.01 | 0.237 |
# Visualize key engagement metrics side by side
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
platforms = platform_metrics.index.tolist()
colors = vidio_palette[:len(platforms)]
# Median play duration
axes[0, 0].bar(platforms, platform_metrics['median_duration'], color=colors)
axes[0, 0].set_title('Median Play Duration (seconds)')
axes[0, 0].tick_params(axis='x', rotation=20)
for i, v in enumerate(platform_metrics['median_duration']):
axes[0, 0].text(i, v + 1, f'{v:.0f}s', ha='center', fontweight='bold', color=VIDIO_DARK)
# Engagement rate
axes[0, 1].bar(platforms, platform_metrics['engagement_rate'] * 100, color=colors)
axes[0, 1].set_title('Engagement Rate (% plays >= 60s)')
axes[0, 1].tick_params(axis='x', rotation=20)
for i, v in enumerate(platform_metrics['engagement_rate'] * 100):
axes[0, 1].text(i, v + 0.5, f'{v:.1f}%', ha='center', fontweight='bold', color=VIDIO_DARK)
# Login rate
axes[0, 2].bar(platforms, platform_metrics['login_rate'] * 100, color=colors)
axes[0, 2].set_title('Login Rate (%)')
axes[0, 2].tick_params(axis='x', rotation=20)
for i, v in enumerate(platform_metrics['login_rate'] * 100):
axes[0, 2].text(i, v + 0.5, f'{v:.1f}%', ha='center', fontweight='bold', color=VIDIO_DARK)
# Plays per watcher
axes[1, 0].bar(platforms, platform_metrics['plays_per_watcher'], color=colors)
axes[1, 0].set_title('Plays per Unique Watcher')
axes[1, 0].tick_params(axis='x', rotation=20)
for i, v in enumerate(platform_metrics['plays_per_watcher']):
axes[1, 0].text(i, v + 0.02, f'{v:.2f}', ha='center', fontweight='bold', color=VIDIO_DARK)
# VOD Completion rate
axes[1, 1].bar(platforms, platform_metrics['completion_rate_vod'].fillna(0) * 100, color=colors)
axes[1, 1].set_title('VOD Completion Rate (%)')
axes[1, 1].tick_params(axis='x', rotation=20)
for i, v in enumerate(platform_metrics['completion_rate_vod'].fillna(0) * 100):
axes[1, 1].text(i, v + 0.5, f'{v:.1f}%', ha='center', fontweight='bold', color=VIDIO_DARK)
# Average bitrate
axes[1, 2].bar(platforms, platform_metrics['avg_bitrate'] / 1000, color=colors)
axes[1, 2].set_title('Average Bitrate (kbps)')
axes[1, 2].tick_params(axis='x', rotation=20)
for i, v in enumerate(platform_metrics['avg_bitrate'] / 1000):
axes[1, 2].text(i, v + 1, f'{v:.0f}k', ha='center', fontweight='bold', color=VIDIO_DARK)
plt.suptitle('Platform Engagement Metrics Comparison', fontsize=14, fontweight='bold', y=1.02, color=VIDIO_DARK)
plt.tight_layout()
plt.show()
# Play duration distribution by platform (box plot, capped at 600s for readability)
fig, ax = plt.subplots(figsize=(12, 5))
plot_df = df[df['play_duration'] <= 600]
sns.boxplot(data=plot_df, x='platform', y='play_duration', palette=vidio_palette, ax=ax)
ax.set_title('Play Duration Distribution by Platform (capped at 600s)')
ax.set_ylabel('Play Duration (seconds)')
ax.set_xlabel('Platform')
plt.tight_layout()
plt.show()
# Content type preference by platform
ct_platform = pd.crosstab(df['platform'], df['content_type'], normalize='index') * 100
ct_platform.plot(kind='bar', stacked=True, figsize=(10, 5),
color=[VIDIO_BLUSH, VIDIO_PINK, VIDIO_DARK])
plt.title('Content Type Preference by Platform')
plt.ylabel('Percentage (%)')
plt.xlabel('Platform')
plt.legend(title='Content Type', bbox_to_anchor=(1.05, 1))
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()
ct_platform.round(1)
| content_type | catchup | livestreaming | vod |
|---|---|---|---|
| platform | |||
| app-android | 0.7 | 69.8 | 29.5 |
| app-ios | 1.2 | 76.0 | 22.8 |
| tv-android | 0.0 | 95.3 | 4.7 |
| tv-tizen | 0.0 | 91.2 | 8.8 |
| tv-webos | 0.0 | 99.3 | 0.7 |
| web-desktop | 0.7 | 45.7 | 53.6 |
| web-mobile | 0.2 | 13.0 | 86.8 |
# Normalized ranking across engagement dimensions
rank_cols = ['median_duration', 'engagement_rate', 'login_rate',
'plays_per_watcher', 'completion_rate_vod', 'avg_bitrate']
rank_df = platform_metrics[rank_cols].rank(ascending=True)
rank_df['composite_score'] = rank_df.mean(axis=1).round(2)
rank_df = rank_df.sort_values('composite_score', ascending=False)
print("Platform Engagement Ranking (higher = better)")
print("="*60)
rank_df
Platform Engagement Ranking (higher = better) ============================================================
| median_duration | engagement_rate | login_rate | plays_per_watcher | completion_rate_vod | avg_bitrate | composite_score | |
|---|---|---|---|---|---|---|---|
| platform | |||||||
| app-android | 7.0 | 7.0 | 7.0 | 5.0 | 7.0 | 5.5 | 6.42 |
| tv-tizen | 5.5 | 5.0 | 5.0 | 6.0 | 3.0 | 5.5 | 5.00 |
| app-ios | 5.5 | 6.0 | 6.0 | 4.0 | 6.0 | 2.0 | 4.92 |
| tv-android | 4.0 | 4.0 | 4.0 | 7.0 | 2.0 | 5.5 | 4.42 |
| web-desktop | 3.0 | 3.0 | 3.0 | 3.0 | 5.0 | 1.0 | 3.00 |
| web-mobile | 2.0 | 2.0 | 2.0 | 2.0 | 4.0 | 3.0 | 2.50 |
| tv-webos | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.5 | 1.75 |
# Heatmap of normalized metrics
fig, ax = plt.subplots(figsize=(10, 4))
norm_df = platform_metrics[rank_cols].apply(lambda x: (x - x.min()) / (x.max() - x.min()))
norm_df.columns = ['Median Duration', 'Engagement Rate', 'Login Rate',
'Plays/Watcher', 'VOD Completion', 'Avg Bitrate']
sns.heatmap(norm_df, annot=True, cmap=vidio_cmap, fmt='.2f',
linewidths=1, ax=ax, vmin=0, vmax=1)
ax.set_title('Platform Engagement Heatmap (0 = lowest, 1 = highest)', fontsize=13)
plt.tight_layout()
plt.show()
6. Quality of Experience¶
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Bitrate distribution by platform
sns.boxplot(data=df, x='platform', y='average_bitrate', palette=vidio_palette, ax=axes[0])
axes[0].set_title('Bitrate Distribution by Platform')
axes[0].set_ylabel('Average Bitrate')
axes[0].tick_params(axis='x', rotation=20)
# Buffer duration vs play duration (sampled for performance)
sample = df[df['buffer_duration'] > 0].sample(min(5000, len(df)), random_state=42)
axes[1].scatter(sample['buffer_duration'].clip(upper=100),
sample['play_duration'].clip(upper=600),
alpha=0.3, s=10, color=VIDIO_PINK)
axes[1].set_title('Buffer Duration vs Play Duration')
axes[1].set_xlabel('Buffer Duration (capped at 100s)')
axes[1].set_ylabel('Play Duration (capped at 600s)')
# Ad impact on play duration
ad_dur = df.groupby('has_ad')['play_duration'].median()
axes[2].bar(['No Ad', 'Has Ad'], [ad_dur[False], ad_dur[True]],
color=[VIDIO_BLUSH, VIDIO_PINK])
for i, v in enumerate([ad_dur[False], ad_dur[True]]):
axes[2].text(i, v + 1, f'{v:.0f}s', ha='center', fontweight='bold', color=VIDIO_DARK)
axes[2].set_title('Median Play Duration: Ad vs No Ad')
axes[2].set_ylabel('Median Play Duration (s)')
plt.tight_layout()
plt.show()
7. Key Insights & Conclusions¶
Insight 1: Web-mobile dominates in volume but not engagement¶
Web-mobile accounts for ~58% of all plays, making it the largest traffic source. However, high volume does not necessarily indicate keen viewership — many of these plays originate from embedded players on partner sites (liputan6, kapanlagi, merdeka) and may represent casual, short-duration viewing.
Insight 2: Livestreaming drives a significant portion of traffic¶
~37% of all plays are livestreaming content, primarily TV channels (SCTV, Indosiar, RCTI, TRANS TV). These are the most-played titles by far, indicating strong demand for live TV streaming.
Insight 3: Most users watch without logging in¶
77% of plays come from anonymous (non-logged-in) users. Premium content is only 0.9% of plays. This suggests a predominantly ad-supported, casual viewing audience — with an opportunity to convert engaged viewers into registered/premium users.
Insight 4: Referrer traffic is key¶
Embedded playback (50.8%) slightly edges out direct plays. Top referrer groups — kapanlagi, merdeka, liputan6 — are major Indonesian media portals. These partnerships drive substantial traffic to Vidio's content.
Insight 5: The keen viewers live on app-android¶
"Keen" viewership should be measured by engagement quality, not just volume. A platform where users watch longer, complete more content, log in, and return for more plays indicates truly keen viewers.
We evaluated all 7 platforms across 6 engagement dimensions:
| Metric | What it measures | Why it matters |
|---|---|---|
| Median Duration | How long users actually watch | Longer = more invested |
| Engagement Rate | % of plays >= 60 seconds | Filters out bounces |
| Login Rate | % of logged-in users | Shows platform commitment |
| Plays per Watcher | Repeat usage intensity | Keen users come back |
| VOD Completion Rate | % content watched to end | Shows content value |
| Average Bitrate | Streaming quality | Higher = better experience |
Result: app-android is the platform users most keenly use to watch Vidio.
Evidence from the composite engagement ranking:
| Rank | Platform | Composite Score | Key Stats |
|---|---|---|---|
| 1 | app-android | 6.42 | Median duration 120s, 74.4% engagement, 76.3% login rate, 32.3% VOD completion |
| 2 | tv-tizen | 5.00 | 60s median, 50.6% engagement, 63.8% login |
| 3 | app-ios | 4.92 | 60s median, 52.6% engagement, 75.5% login |
| 4 | tv-android | 4.42 | 45s median, highest plays/watcher (1.19) |
| 5 | web-desktop | 3.00 | 32s median, low login (5.5%) |
| 6 | web-mobile | 2.50 | 26s median, only 27.7% engagement, 1.1% login |
| 7 | tv-webos | 1.75 | Too few plays (153) for meaningful comparison |
Why app-android users are the keenest viewers:
- Longest watch sessions — 120s median duration, nearly 5x longer than web-mobile (26s)
- Highest engagement rate — 74.4% of plays last more than 60 seconds (vs 27.7% on web-mobile)
- Highest login rate — 76.3% of app-android users are logged in, showing account commitment
- Best VOD completion — 32.3% of VOD content is watched to completion
- Intentional usage — installing an app requires deliberate effort, unlike clicking an embedded web player
- Max bitrate streaming — 300kbps average, indicating users choosing higher quality
In contrast, web-mobile has the highest volume (58% of all plays) but the lowest engagement — most of this traffic comes from casual embedded plays on partner news sites (kapanlagi, merdeka, liputan6), where users may not even intentionally seek out Vidio content.